InfoQ Homepage Batch Processing Content on InfoQ
-
From Camera to Cloud: Netflix’s Scalable Media Processing Pipeline
Netflix has detailed a cloud-based system for scaling camera file processing across global film and TV workflows. The pipeline handles ingest, validation, metadata extraction, and media transformation at scale using FilmLight API and distributed compute. It standardizes workflows across editorial, VFX, and color pipelines, improving consistency and reducing manual handling across productions.
-
30+ Updates per Second per Account: Uber Scales Ledger Processing with Batching
Uber introduced a high-throughput financial ledger processing system designed to handle hot account write contention at scale. Using 250ms batching, Redis coordination, and optimistic atomic updates, the system supports 30+ updates per second per account while preserving consistency and auditability, reducing multi-hour processing pipelines to minutes in its distributed accounting infrastructure.
-
Lyft Scales Global Localization Using AI and Human-in-the-Loop Review
Lyft has implemented an AI-driven localization system to accelerate translations of its app and web content. Using a dual-path pipeline with large language models and human review, the system processes most content in minutes, improves international release speed, ensures brand consistency, and handles complex cases like regional idioms and legal messaging efficiently.
-
Pinterest Reduces Spark OOM Failures by 96% through Auto Memory Retries
Pinterest Engineering cut Apache Spark out-of-memory failures by 96% using improved observability, configuration tuning, and automatic memory retries. Staged rollout, dashboards, and proactive memory adjustments stabilized data pipelines, reduced manual intervention, and lowered operational overhead across tens of thousands of daily jobs.
-
Karrot Improves Conversion Rates by 70% with New Scalable Feature Platform on AWS
Karrot replaced its legacy recommendation system with a scalable architecture that leverages various AWS services. The company sought to address challenges related to tight coupling, limited scalability, and poor reliability in its previous solution, opting instead for a distributed, event-driven architecture built on top of scalable cloud services.
-
Uber Completes Massive Kubernetes Migration for Microservices and Large-Scale Compute Workloads
Uber has successfully completed a large Kubernetes migration, transitioning its entire compute platform from Apache Mesos to Kubernetes across multiple data centers and cloud environments.
-
Scaling Uber’s Batch Data Platform: a Journey to the Cloud with Data Mesh Principles
Some months ago, Uber started the migration to the cloud, on Google Cloud Platform (GCP), of its batch data analytics and machine learning platform. In a recent post on its engineering blog, Uber provided additional information regarding its batch data cloud migration that incorporated crucial data mesh principles.
-
JobRunr Introduces Version 7.0 with Built-in Support for Virtual Threads
JobRunr v7 now defaults to virtual threads for applications using JDK 21, optimizing concurrency for I/O-bound tasks and allowing more jobs to run simultaneously. This update maintains compatibility with Java 8 and supports GraalVM native mode. RedisStorageProvider and ElasticSearchStorageProvider are planned to be dropped in future releases. MongoDB driver is also upgraded.
-
AWS Batch Introduces Multi-Container Jobs for Large-Scale Simulations
Recently, AWS announced the support of multi-container jobs in AWS Batch through the management console. This new feature simplifies the process of running simulations, particularly for testing complex systems such as those used in autonomous vehicles and robotics.
-
Cadence 1.0: Uber Releases Its Scalable Workflow Orchestration Platform
Uber released a major version of its workflow orchestration platform named Cadence after six years in development. Uber and other companies use Cadence to build stateful services at scale using native programming languages.
-
Pfizer Uses Serverless Architecture on AWS to Scale Processing of Digital Biomarkers
Pfizer upgraded the serverless architecture for processing digital biomarker data at scale to make it more flexible and configurable. They created a framework that uses a file processing pipeline built with AWS Step Functions and other serverless services, as well as a custom Python package for data ingestion and processing.
-
Cloudflare Previews Globally Distributed Queues without Egress Fees
Cloudflare recently announced the private beta of Cloudflare Queues, a message queuing service that allows applications to send and receive messages using Cloudflare Workers. The new service provides at-least once message delivery, supports batching of messages, and does not charge bandwidth egress fees.
-
Google Cloud Introduces Batch, a Service for Scheduling Batch Jobs
Google Cloud recently announced the preview of Batch, a managed service to run batch jobs at scale. The new service supports the latest T2A Arm-based instances and Spot VMs for large batch jobs utilizing task parallelization.
-
AWS Introduces Batch Support for AWS Fargate
During the first week of the annual re:invent, AWS introduced the ability to specify AWS Fargate as a computing resource for AWS Batch jobs. With the AWS Batch support for AWS Fargate, customers will have a way to run jobs on serverless compute resources, fully-managed from job submission to completion.
-
Dynein – an Asynchronous Background Job Service from Airbnb
At Airbnb, they move time consuming, resource intensive tasks over to asynchronous background jobs to improve scalability. The job scheduling system has become a very important component and they have therefore built Dynein, a distributed delayed job queueing service and scheduler. In a blog post, Andy Fang from Airbnb describes the background and challenges in designing and building the service.