InfoQ Homepage Performance Content on InfoQ
-
Pinterest Engineers Eliminate CPU Zombies to Resolve Production Bottlenecks
Pinterest identified and resolved CPU starvation issues that affected machine learning training jobs on its Kubernetes-based platform, PinCompute. The engineers traced the problem to an unused Amazon ECS agent, which caused memory cgroup leaks. By disabling the agent, they stabilised performance. This case illustrates the importance of understanding system defaults for effective troubleshooting.
-
Legare Kerrison and Cedric Clyburn on LLM Performance and Evaluations
Effectively measuring the performance of applications that are leveraging Large Language Models (LLM) is critical to the adoption of AI technologies in organizations. Legare Kerrison and Cedric Clyburn from RedHat team recently spoke at Arc of AI 2026 Conference about practical methods to evaluate and optimize LLM inference.
-
Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware
Google Research unveiled TurboQuant, a novel quantization algorithm that compresses large language models’ Key-Value caches by up to 6x. With 3.5-bit compression, near-zero accuracy loss, and no retraining needed, it allows developers to run massive context windows on significantly more modest hardware than previously required. Early community benchmarks confirm significant efficiency gains.
-
Cloudflare and ETH Zurich Outline Approaches for AI-Driven Cache Optimization
Cloudflare and ETH Zurich highlight how AI-driven crawler traffic challenges traditional caching in CDNs and databases. They propose AI-aware strategies including separate cache tiers, adaptive algorithms, and pay-per-crawl models to balance performance for human users and AI services while maintaining cache efficiency and system stability.
-
Cloudflare Launches Dynamic Workers Open Beta: Isolate-Based Sandboxing for AI Agent Code Execution
Cloudflare has released Dynamic Worker Loader into open beta, offering V8 isolate-based sandboxing for AI-generated code execution. The company claims isolates start in milliseconds, using megabytes of memory, making them roughly 100x faster and up to 100x more memory-efficient than containers. The feature builds on Cloudflare's Code Mode approach.
-
Discord Engineers Add Distributed Tracing to Elixir's Actor Model without Performance Penalty
Discord engineering detailed how they added distributed tracing to Elixir's actor model. Their custom Transport library wraps messages with trace context and uses dynamic sampling to handle million-user fanouts. CPU optimizations included skipping unsampled traces and filtering context before deserialization, recovering 10+ percentage points of overhead.
-
Cloudflare Introduces Local Uploads for R2 to Cut Cross-Region Write Latency by 75%
Cloudflare has recently introduced Local Uploads for R2 in open beta. The new feature optimizes write performance for globally distributed users without changing bucket location, reducing cross-region write latency.
-
Rspack Releases Version 1.7: Final 1.x Update before 2.0 Transition
Rspack 1.7 has launched, enhancing performance and plugin compatibility as it prepares for a major version transition. Key features include improved SWC plugin compatibility, native asset importing as bytes, and default lazy compilation for dynamic modules. With performance gains reported up to 80%, Rspack offers a faster, Rust-based alternative to webpack while maintaining API compatibility.
-
VoidZero Announces Oxfmt Alpha with Rust-Powered Performance and Prettier Compatibility
VoidZero has unveiled Oxfmt, a cutting-edge Rust-based code formatter that offers over 30x faster performance than Prettier for JavaScript and TypeScript projects. Compatible with existing Prettier configurations, Oxfmt addresses developer needs for efficiency and style consistency. Enjoy seamless migration, enhanced capabilities, and a commitment to community-driven improvements.
-
Prisma 7: Rust-Free Architecture and Performance Gains
Prisma ORM 7.0 has revolutionized the TypeScript-first ORM landscape with a Rust-free architecture, delivering 3x faster queries, 90% smaller bundles, and improved developer experience. With dynamic configurations and streamlined artifact management, Prisma enhances productivity while supporting major databases. Elevate your Node.js projects with cutting-edge performance and type safety.
-
AWS Introduces Fifth-Generation Graviton Processor with M9g Instances
AWS recently announced the new Graviton5 processor and the preview of the first EC2 instances running on it, the general-purpose M9g instances. According to the cloud provider, the latest chip delivers up to 25% higher performance than Graviton4, introduces the Nitro Isolation Engine, and provides a larger L3 cache, improving latency, memory bandwidth, and network throughput.
-
Google Boosts ART Compile Times by 18% without Compromising Code Quality
Google's Android Runtime (ART) team has achieved a 18% reduction in compile times for Android code without compromising code quality or increasing peak memory usage, delivering significant performance improvements for both just-in-time (JIT) and ahead-of-time (AOT) compilation.
-
Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus
Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing across servers, workloads, and cloud SKUs, helping teams validate changes, identify regressions, and optimize resources. Future plans include AI integration, anomaly detection, and continuous validation.
-
Nuxt Introduces Native Request Cancellation and Async Handler Extraction for Performance Gains
Nuxt 4.2 elevates the developer experience with native abort control for data fetching, improved error handling, and experimental TypeScript support. With a 39% reduction in bundle sizes and a streamlined app directory, this release enhances performance and project organization, positioning Nuxt as a leading choice for full-stack web applications built on Vue.js.
-
Netflix Migrates to Amazon Aurora: 75% Performance Boost and 28% Cost Reduction
Netflix consolidated its relational databases onto Amazon Aurora, cutting costs by 28% and boosting performance by up to 75%. The move from self-managed PostgreSQL reduced operational toil, improving latency for critical apps. This mirrors migrations by Samsung and Panasonic, though benchmarks suggest alternatives like Timescale may suit specific workloads better.