InfoQ Homepage Operations management Content on InfoQ
-
Platform Engineering, DevOps, and Cognitive Load: a Summary of Community Discussions
Operations engineering is moving in the direction of platform engineering according to Charity Majors, CTO at Honeycomb. Majors sees platform teams tending to work higher up the stack than operations, DevOps, and SRE teams do. This shift in focus enables organizations to focus their limited development resources on their core product to drive maximum business value.
-
How AI Supports IT Operators to Resolve Issues Faster and Keep Systems Running
AIOps is all about equipping IT teams with algorithms that can help in quicker evaluation, remediation or actionable insights based on their historical data without the need to solicit feedback from users directly. AI can help IT operators to work smart, resolve issues faster and keep the systems up and running to deliver great end-user experience.
-
AWS Launches a New Console Home Page to Manage Cloud Resources
Recently, AWS launched a version of the AWS Management Console. Through the home page of the console, customers can have access to each service console, and it offers a single place to access the information they need to perform their AWS related tasks.
-
NGINX Controller Application Delivery Modules Improve Health Checks and Caching Configurations
NGINX has released new versions of their NGINX Controller Application Delivery Module, a control plane solution for NGINX Plus load balancers. The new features include enhanced workload health-checks, improvements to caching configuration, and instance groups.
-
Linkerd Showcases Rust in Cloud-Native Infrastructure
The Linkerd project has recently become a graduated project in the CNCF. One of the most interesting aspects of Linkerd that differentiates it from other service mesh products is the Rust-based Linkerd2-proxy. Rust has made Linkerd significantly faster and lighter than other service mesh solutions.
-
Ambassador Developer Control Plane Integrates Common Kubernetes Full Lifecycle Tooling
Ambassador Labs announced the release of their Developer Control Plane (DCP). The DCP brings together tooling to support the full development and operations of Kubernetes based services. This includes popular Cloud Native Computing Foundation (CNCF) tools such as Argo, Telepresence, and Envoy Proxy.
-
Cloudflare Improves Automated Terraform Generation Tool
Cloudflare recently released an updated version of their cf-terraforming tool. This tool streamlines generating Terraform HCL from existing Cloudflare resources. The new release simplifies the generation process and introduces changes to better future proof the tool.
-
Consul-Terraform-Sync Enables Automating of Common Networking Tasks
HashiCorp has moved Consul-Terraform-Sync (CTS) into full general availability. CTS allows for the definition of tasks as Terraform modules that can be run as services are added or removed from Consul. CTS is part of a solution called Network Infrastructure Automation (NIA) which focuses on automating day two network tasks such as updating load balancer pools or firewall policies.
-
AWS Publishes Best Practices Guide for Operational Dashboards
AWS recently added to the Amazon Builders' Library their best practices for building dashboards for operational visibility. The document includes a detailed description of the different types of dashboards that exist at Amazon as well as a discussion of the design best practices used to create dashboards.
-
Microsoft Introduces the Azure Well-Architected Framework
In a recent blog post, Microsoft introduced the Azure Well-Architected Framework, which provides customers with a set of Azure architecture best practices to help them build and deliver well-designed solutions.
-
Improving Incident Management through Role Assignments and Game Days
John Arundel, principal consultant at Bitfield Consulting, shared his thoughts on how to ensure incidents are handled smoothly and quickly. He suggests assigning specific roles to each team member responding to the incident. Red team versus blue team exercises can also be leveraged to ensure the team is prepared to respond accurately and quickly.
-
Failure Modes and Building Resilient Systems: Adrian Cockcroft at QCon SF
Adrian Cockcroft recently shared his thoughts on how to produce resilient systems that operate successfully in spite of the presence of failures. At the recent QCon San Francisco event, he also shared what he considers are good cloud resilience patterns for building with a continuous resilience mindset.
-
DataOps and Operations-Centric Data Architecture
Eric Estabrooks from DataKitchen spoke at this year's Data Architecture Summit 2019 Conference about how DevOps tasks should be managed for data architecture. DataOps is a collaborative data management practice and is emerging as an area of interest in the industry.
-
OpsRamp Releases Improved Alert Correlation and Better Insights into Event Management Models
OpsRamp, a SaaS platform for datacenter operations management, announced its Fall 2019 release which includes a number of enhancements to its intelligent event management and correlation machine learning models. This release also includes multi-cloud infrastructure monitoring capabilities, synthetic monitoring, and a custom integration framework.
-
Bringing VMware Environments to Azure, Microsoft and VMWare Establish Partnership
At the recent Dell Technologies World conference, Microsoft and VMware announced an expanded partnership that enables certified VMware cloud infrastructure to run in Microsoft Azure. The Microsoft first party capability is made possible through a solution provided by CloudSimple, a VMware certified partner, and officially is called Azure VMware Solution by CloudSimple.