Inside InfluxDB 3.0: Exploring InfluxDB’s Scalable and Decoupled Architecture

InfluxData recently unveiled the system architecture for InfluxDB 3.0, its newest time-series DB. Its architecture contains four major components responsible for data ingestion, querying, compaction, and garbage collection and includes two main storage types. It also caters to operating the DB on-premise and natively on major cloud providers.

InfluxDB 3.0 Architecture (Source)

A central pillar of InfuxDB's architecture is the engineers' decoupled design of the main components. These components do not communicate with each other directly. All communication is facilitated via the Catalog and Object Storage, and elements like the ingesters and queriers are unaware of the existence of the compactors and garbage collectors. As a result, these components can scale and be extended independently and deployed in various manners.

The data ingestion component (in blue) handles the process of data input into the database. Users write data to the Ingest Router, which then shards the data to one of the Ingesters, allowing the number of ingesters to be scaled depending on the data workload.

Each ingester identifies tables, validates data schema, partitions the data by day on the "time" column, deduplicates it, and persists it as a Parquet file. The ingester also updates the Catalog about the newly created file, signalling to other components that new data has arrived. InfluxDB optimizes the write path to keep the write latency minimal, in the order of milliseconds.

The data querying component (in green) processes user queries in SQL or InfluxQL. Users send queries to the Query Router, which forwards them to a Querier. The Querier reads the needed data, builds a plan for the query, executes it, and returns the result.

Queriers can scale depending on the query workload. They perform tasks such as caching metadata, reading and caching data, communicating with ingesters for not-yet-persisted data, and building and executing an optimal query plan.

Data compaction (in red) is a process that addresses the challenge of having many small files stored in the Object Storage, which could hinder query performance. Compactors run in background jobs to read newly ingested files and compact them into fewer, larger, and non-overlapped files. This process enhances query performance by significantly reducing I/O during query time.

The number of compactors can be scaled based on the compacting workload, considering factors such as the number of tables with new data files, the size of the files, and the number of existing files the new files overlap with.

The garbage collection component (in pink) manages data retention and space reclamation within the database. It operates through background jobs that schedule both soft and hard data deletions.

The Catalog includes metadata such as information about databases, tables, columns, and files, and InfluxDB uses a Postgres-compatible database to manage this Catalog. On the other hand, Object Storage contains only Parquet files and can be stored on platforms like a local disk, Amazon S3, Azure Blob Storage, or Google Cloud Storage.

About the Author

Eran Stiller

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

About the Author

Eran Stiller

Rate this Article

This content is in the Cloud Computing topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter