BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices

Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices

Listen to this article -  0:00

Uber Engineering announced the open source release of uForwarder, a push-based consumer proxy for Apache Kafka designed to improve scalability, efficiency, and operational control for high-throughput event streaming across distributed microservices. uForwarder serves as an intermediate layer between Kafka and consumer services, replacing direct consumer client implementations with a gRPC-driven push interface. The system is intended to simplify consumer logic, centralize offset management, isolate workloads, and provide built-in delay processing for event queues.

The need for uForwarder arose from Uber's internal Kafka deployment, which supports over 1,000 downstream consumer services and handles trillions of messages and multiple petabytes of data per day. Standard Kafka consumer groups presented limitations at this scale, including partition management complexity, inconsistent language support across services, and operational overhead. Direct consumer clients required each service to implement offset handling, retry logic, and delay mechanisms, which increased the likelihood of head-of-line blocking and inefficiency in resource utilization.

High-level consumer proxy architecture (Source: Uber Blog Post)

The previous internal consumer proxy exposed four main challenges. Sequential partition processing could stall when a message failed delivery due to payload size limits or invalid service instances, creating head-of-line blocking. Running thousands of proxy servers to support consumer services proved inefficient, as hardware resources were consumed unevenly. Consumer services often implemented bespoke delay semantics, increasing service complexity. Isolation of workloads across production and non-production environments or across regional zones required either topic proliferation or complicated load-balancing configurations.

uForwarder introduces context-aware routing to improve workload isolation and delivery precision. Kafka message headers propagate routing metadata into downstream gRPC calls, allowing infrastructure-level decisions instead of application filtering. Load balancers deliver events only to matching consumer instances based on region, tenant, or environment, reducing unnecessary traffic and simplifying consumer logic.

Context-aware routing (Source: Uber Blog Post)

The out-of-order commit tracker strengthens offset management by preventing partition stalls. It monitors commit progress independently and detects stuck offsets based on configured thresholds. Problematic messages are redirected to a dead letter queue while the commit pointer advances, avoiding head-of-line blocking and maintaining consistent throughput across partitions.

The consumer auto rebalancer continuously evaluates CPU usage, memory pressure, and throughput across worker instances. Based on real-time metrics, it redistributes partitions to balance load efficiently. It scales up quickly to reduce lag during traffic spikes and scales down gradually to prevent instability, improving overall resource utilization and performance consistency.

DelayProcessManager enables partition-level pause and resume control for finer-grained backpressure handling. Instead of halting an entire consumer, only blocked partitions are buffered when dependencies are unavailable or rate-limited. Other partitions continue processing normally, preserving throughput and reducing global slowdowns while simplifying delay handling within services.

Delay processing in Consumer Proxy worker fetcher thread (Source: Uber Blog Post)

Uber reports that uForwarder has become the dominant Kafka consumer option internally and is now available as an open-source project on GitHub. The architecture enables improved workload isolation, reduced consumer lag, and more efficient hardware utilization, while simplifying consumer logic in microservices.. The team is expanding queue capacity and addressing lag by rewinding offsets to the latest position while using a side consumer to process delayed data. Native Protobuf support is also being added to allow services to receive structured messages directly.

About the Author

Rate this Article

Adoption
Style

BT