InfoQ Homepage DevOps Content on InfoQ

Articles

RSS Feed

Newer Older

DevOps

Beyond the Padlock: Why Certificate Transparency is Reshaping Internet Trust

Certificate Transparency (CT) creates public, append-only logs of every TLS certificate issued, enabling detection of rogue or mistaken certificates. This article explores how CT has transformed internet PKI by moving from reliance on certificate authority trustworthiness to providing verifiable transparency that major browsers now require.

Karthiek Maralla
on Sep 08, 2025
DevOps

How Causal Reasoning Addresses the Limitations of LLMs in Observability

Large language models excel at converting observability telemetry into clear summaries but struggle with accurate root cause analysis in distributed systems. LLMs often hallucinate explanations and confuse symptoms with causes. This article suggests how causal reasoning models with Bayesian inference offer more reliable incident diagnosis.

Dhairya Dalal
on Sep 02, 2025
Cloud

Ransomware-Resilient Storage: the New Frontline Defense in a High-Stakes Cyber Battle

Cybersecurity has evolved, with ransomware now primarily targeting data storage and backups. To combat this, modern defense strategies focus on making storage systems more resilient. Key tactics include using immutable storage that prevents data from being altered or deleted, employing AI-powered detection, and implementing air-gapping to create isolated, tamper-proof recovery points.

Arjun Mullick
on Aug 25, 2025
Cloud

Zero-Downtime Critical Cloud Infrastructure Upgrades at Scale

Engineers can avoid common pitfalls in large-scale infrastructure upgrades by studying others' experiences. The article provides lessons learned from big firms like eBay and Snowflake, offering solutions for legacy systems, performance validation, and rollback planning. It emphasizes systematic preparation and clear communication to handle challenges and ensure zero-downtime upgrades at scale.

Kiran Bhat
on Aug 18, 2025
Architecture & Design

One Network: Cloud-Agnostic Service and Policy-Oriented Network Architecture

Bringing together software infrastructure leads to faster development time and easy control of large, spread-out systems through clear rules. In this QCon SF 2024 presentation, Anna Berenberg shared learnings and achievements when building One Network, addressing complex infrastructure layers, open-source integration, and uniform policy enforcement for improved reliability and security.

Anna Berenberg
on Aug 12, 2025
Cloud

Sandbox as a Service: Building an Automated AWS Sandbox Framework

This article outlines an automated AWS Sandbox Framework to provide secure, cost-controlled environments for innovation. It leverages AWS services like Control Tower and open-source tools to automate provisioning, enforce security policies, manage resource lifecycles, and optimize costs through automated cleanup and governance.

Gaurav Mittal
on Aug 11, 2025
Cloud

Backend FinOps: Engineering Cost-Efficient Microservices in the Cloud

Backend FinOps integrates financial discipline into microservices, crucial for cutting cloud costs. Challenges such as resource fragmentation and cold starts underscore the need for intelligent design, effective language choice, robust tagging, and automation. Implementing FinOps via IaC, CI/CD checks, and dynamic autoscaling (e.g., Karpenter) ensures sustained efficiency.

Vivek Arora
on Aug 06, 2025
DevOps

Ceph RBD Turns 15: a Story of Open Source Creation

Fifteen years ago, Ceph RBD began as a community-driven idea that grew into essential infrastructure powering today's cloud platforms. This insider story from Yehuda Sadeh-Weinraub reveals how two developers started a distributed storage that now supports OpenStack and Kubernetes through transparent, collaborative development.

Yehuda Sadeh-Weinraub
on Jul 07, 2025
DevOps

Why Is My Docker Image So Big? A Deep Dive with ‘dive’ to Find the Bloat

AI images typically bloat from massive library installations and base OS components, with large Docker images slowing AI development and increasing costs. Chirag Agrawal demonstrates how to diagnose bloat using Docker's history and the interactive 'dive' tool to examine each layer in detail. The article shows how effective diagnosis leads to targeted optimizations.

Chirag Agrawal
on Jun 30, 2025
Cloud

Engineering Principles for Building a Successful Cloud-Prem Solution

Discover how Cloud-Prem solutions combine cloud efficiency with on-premise control, meeting data sovereignty and compliance demands while optimizing operational costs and enhancing customer security.

Satyam Dhar
on Jun 26, 2025
DevOps

Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies

Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.

Srikanth Daggumalli Nishchai Jayanna Manjula
on Jun 20, 2025
Cloud

We Took Developers out of the Portal: How APIOps and IaC Reshaped Our API Strategy

Dynamic API strategist with expertise in transforming legacy management into efficient APIOps frameworks using Infrastructure as Code (IaC). Proven track record in automating API lifecycles, enhancing security, and fostering developer productivity through CI/CD integration. Adept at driving operational excellence and consistency across environments, enabling rapid deployment and innovation.

Balakrishna Sudabathula
on Jun 12, 2025

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles