BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles DevOps and Cloud InfoQ Trends Report - July 2021

DevOps and Cloud InfoQ Trends Report - July 2021

This item in japanese

Bookmarks

Key Takeaways

  • Hybrid cloud options have evolved beyond the traditional definition, and have expanded to enable the functionality of cloud services to run outside of the cloud. Services such as Azure Arc and Google Anthos allow for a much more seamless "hybrid" experience for developers and operators.
  • In the emerging "no-copy data-sharing" approach it is not necessary to move or replicate the data to be able to access it from different services. We believe that the recently announced "Delta Sharing" open standard will contribute to the upward trajectory of no-copy data sharing.
  • We believe that there has been limited progress on undoing the confusion between continuous integration (CI) and continuous delivery / (CD) tooling and practices. Both GitOps and site reliability engineering (SRE) practices are increasingly being adopted.
  • Observability practices and tooling continue to mature. The logging and metrics domains of the three pillars of observability are relatively well-adopted, but the tracing pillar remains less so. There are a number of encouraging advancements in this space, especially with the more widespread adoption of OpenTelemetry.
  • We are following the innovative developments with FinOps, real-time information flow for cost analysis, aimed at the finance teams with public cloud vendors like Microsoft and AWS at the frontline.
  • Increasingly popular practices, such as "Policy as Code", as promoted by Open Policy Agent (OPA), and remote access management tooling, e.g. HashiCorp's Boundary, are pushing forward identity as code and privacy as code.
  • We have seen the Team Topologies book become the de facto reference for arranging teams within an organization to enable effective software delivery. There is also increasing focus on post-incident "blameless postmortems" in becoming more akin to "healthy retrospectives", from which the entire organisation can learn from.

This article summarizes how we currently see the "cloud computing and DevOps" space, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.

Both InfoQ and the QCon conference series focus on topics that we believe fall into the "innovator, early adopter, and early majority stages" of the diffusion of technology, as defined in Geoffrey Moore’s book "Crossing the Chasm." What we try to do is identify ideas that fit into what Moore referred to as the early market, where "the customer base is made up of technology enthusiasts and visionaries who are looking to get ahead of either an opportunity or a looming problem." We are also looking for ideas that are likely to "cross the chasm" to broader adoption. It is perhaps worth saying, in this context, that a technology's exact position on the adoption curve can vary. For example, microservices are widely adopted amongst Bay Area companies but maybe less widely adopted and perhaps less appropriate elsewhere.

In this edition of the cloud computing and DevOps trend report, we believe that hybrid cloud approaches have evolved to become more "cloud native". In late 2019 all the three prominent public cloud vendors brought new hybrid cloud products to the market and over the last two years they have continued to invest heavily in them - Google with Anthos, Microsoft with the Azure Arc and Stack offerings, AWS with Outposts, and more recently, Amazon ECS Anywhere. For enterprises, for instance, it is about not only bringing workloads to the Cloud but also running them on-premise or both, or on multiple clouds. Thus, managing the infrastructure for the workloads centrally with a service like Arc or Anthos delivers value. Furthermore, these products allow enterprises to extend their platform.

There has been increasing adoption (and technological evolution) in the space of "edge cloud" and "edge computing", and so we believe this topic should move to the early adopter stage of our graph. There is a fair amount of traction here from specific vendor tools, such as Cloudflare Workers, Fastly’s Compute@Edge, and Amazon CloudFront’s Lambda@Edge.

The participants of this report have also identified an emerging trend named "no copy data sharing." This can be seen in data management services such as Snowflake, which do not copy or move data, yet enable users to share data at its source. Another example is the Azure Synapse service, which supports no-copy data sharing from Azure Cosmos DB via Azure Synapse Link. The recently announced Delta Sharing open standard is also contributing to the upward trajectory of the no-copy data sharing tendency.

Observability continues to be a popular topic within DevOps and SRE. While most organizations have begun to implement some form of observability stack, as Holly Cummins notes, the term is overloaded and therefore should be broken down into its various components. Ideas such as centralized log aggregation are currently commonplace in most organizations, however, logs only make up one of the three pillars of observability.

The increasingly popular OpenTelemetry project provides a consistent framework for capturing not just logs, but also traces and metrics. The consistency provided by adopting a single framework helps with capturing data across hybrid and heterogeneous environments and also monitoring tooling. The use of service level objectives (SLOs) as a tool to communicate the desired outcome of monitoring and observability is also gaining popularity as seen with the first ever SLOConf earlier this year.

"DevOps for Data" has seen increasing adoption over the past year with the rise of both MLOps and DataOps. MLOps focuses on using DevOps style practices (such as CI and CD) to implement continuous training for machine learning models. Open source tooling and commercial services exist to help in this area, such as KubeFlow for deploying ML models on Kubernetes, and Amazon SageMaker Model Monitor for automating monitoring of ML models. DataOps looks to shorten the cycle time of data analytics by applying similar concepts used by DevOps teams to reduce their own cycle times.

In the people and organisational space of DevOps we have seen the Team Topologies book, from Matthew Skelton and Manuel Pais, become the de facto reference for arranging teams within an organization to enable effective software delivery. Team Topologies describes four fundamental team types and three team interaction patterns, and dives into the responsibility boundaries of teams and how teams can communicate or interact with other teams.

There is also increasing focus on post-incident "blameless postmortems" in becoming more akin to "healthy retrospectives", from which the entire organisation can learn from. Key leaders in the computing domain of resilience engineering and the "Learning From Incidents" community have been influential in driving this discussion.

For context, here is what the topic graph looked like for the first half of 2019. The 2021 version is at the top of the article.

The following is a lightly edited copy of the corresponding internal chat log between several InfoQ cloud computing DevOps topic editors and InfoQ contributors, which provides more context for our recommended positioning on the adoption graph.

Lena Hall - Director, Large-Scale Systems @ Microsoft, who contributed the recent 'Can We Trust the Cloud Not to Fail?' and 'Evolution of Azure Synapse: Apache Spark 3.0, GPU Acceleration, Delta Lake, Dataverse Support' articles:

Moving to Early Majority

  • GitOps (especially with GitHub Actions gaining traction)

Moving to early adopters

  • DataOps is gaining a lot of traction (especially data governance, data lineage, data quality, data catalogue tooling)

New topics in early innovators

  • Innovators: Policy as Code (checking in/managing access policies close to where the components source lives)
  • Innovators: Cloud-Native Hybrid Approaches (e.g., Azure Arc, Google Anthos)
  • Innovators: Cross-Cloud Uniform Infra Automation / Ops (e.g., Crossplane, Open Policy Agent)
  • Innovators: Data Mesh (decentralized data ownership, federated data governance, product thinking, self-serve data infrastructure)
  • Innovators/Early Adopters: No Copy Data Sharing (Snowflake, Google BigQuery, Azure Synapse, Presto/Starburst, etc.)
  • DevSecOps (however, it's probably not new, as you already had shift-left in the last report)

Furthermore, Hall discussed the evolution of the hybrid cloud model with InfoQ:

Cloud-Native Hybrid Approaches (e.g., Azure Arc, Google Anthos). In its traditional understanding, a hybrid cloud means using on-premises infrastructure for parts of the company's workloads and using the public cloud for other parts of it, where relevant. For example, a hybrid cloud strategy could mean running some applications in the cloud (e.g., serverless processing) and some applications on local infrastructure (e.g., business ERP system). The hybrid data approach could look as simple as storing backups of data in the cloud, or using more advanced cloud gateways and precisely-tuned data sync, caching, or movement patterns.

In the recent months and years, hybrid options have evolved beyond the traditional definition. They have expanded to enable the functionality of cloud services to run outside of the cloud, allowing for a much more seamless and smooth experience. We can think of the new cloud-native hybrid options as an extension of cloud services to our on-premises or multi-cloud environments. Azure Arc and Google Anthos are perfect examples. One of the ideal examples is Azure Arc. It allows the running of Azure SQL Managed Instance and Azure Database for PostgreSQL Hyperscale in hybrid and multi-cloud environments. It also provides the management of applications consistently on-premises and in the cloud with Azure App Service, Functions, Logic Apps, and more. Another fantastic example is Google Anthos powering BigQuery Omni, enabling its query engine to be deployed and managed on multiple clouds.

It's important to note that these new cloud-native hybrid approaches like Azure Arc and Google Anthos are different from more standard hybrid cloud approaches, such as Azure Stack or Amazon Outposts. To understand it more clearly, both Azure Arc and Azure Stack offer hybrid solutions (and can be used together). Azure Stack is a hardware approach to run Azure environment on-premises. On the other hand, Azure Arc is a software approach treating the on-premises and multi-cloud resources, consisting of virtual or physical servers and Kubernetes clusters, as something natively managed and accessible by Azure Resource Manager. Similarly, we can make a distinction between Google Anthos/Azure Arc and AWS Outposts. Even though AWS Outposts extends the AWS cloud platform to on-premises, it has specific hardware requirements and only works with hardware devices designed and supported by AWS.

Hall also shared an interesting insight into the emerging trend of "no copy data sharing":

No Copy Data Sharing (Snowflake,Google BigQuery, Azure Synapse, Presto/Starburst, etc.)There are quite a few approaches to working with data when the same data needs to be accessed or shared between different services or different environments. There might be a variety of scenarios when it comes to data sharing.
One scenario could be storing the same data (or parts of the same data) in several services running in the same cloud. Often, the compute and storage of such services aren't independent. To work with the data from another service, we'd have to copy or move it.

In the emerging no-copy data-sharing approach, and with separation of compute and storage, it's not necessary to move or replicate the data to be able to access it from different services. As a result, it enables better scalability, cost-efficiency, and direct access to data. As an example, this can be enabled by features such as Azure Synapse Link for Azure Cosmos DB, BigQuery external data sources, or Snowflake data sharing.

Recently announced Delta Sharing is an open standard, contributing to the upward trajectory of no-copy data sharing tendency. According to its website, "Delta Sharing is the industry's first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use."

Holly Cummins, an innovation leader in IBM Corporate Strategy who has spent several years as a consultant in the IBM Garage, shared and contributed the "Cloud-Native Is about Culture, Not Container" talk and article shared with InfoQ:

Wow, some things in tech change so fast, and some hardly change at all. Most of the introduction of the 2019 DevOps and Cloud InfoQ Trends Report feels like it could have been written now.

I don't think there's been any progress on undoing the confusion between continuous integration (CI) and continuous delivery (CD) tooling and CI/CD practices, and I don't think there's been progress on adopting CI and CD practices.

One thing which seems to be missing in the discussion is observability. Like "AIOps", it's overloaded and buzz-wordy, but it feels a bit like how "cloud-native" was in 2018. So if your product is anywhere remotely in this space, you have to throw in the word 'observability.' Then other people will point out that it's not really observability because observability means something distinct, and then there's an extensive discussion. :)

In the chart itself, breaking it down into things like "centralized log aggregation" seems wise because then it avoids the argument, but it would be worth mentioning in the discussion. Another subcategory to include might be OpenTelemetry, in the innovator section.

I feel like service meshes have slipped out of the conversation a bit.

FinOps should be in the innovator section. A related topic is cloud cost optimization. There's an exciting conversation started by James Governor from the RedMonk team "Shifting cost optimisation left: Spotify Backstage Cost Insights" where he makes a distinction between FinOps (real-time information flow for cost analysis, aimed at finance team) and "shifting cloud cost optimization left" (aimed at engineering team).

Sustainability accounting is another innovator category. Again, it's very early - I can't think of many products with enough maturity in this area - but the conversations are happening, and I think it's coming.

I think DevOps for data and ML at the edge have moved to early adopters. DevOps for Data has two subcategories, MLOps and DataOps.

I'd say site reliability engineering (SRE) is now the early majority. I'm seeing a lot of banks adopting it.

I see a lot of conversation around GitOps, but I'm not sure that's translating into implementation in prod, or at least not enough to push it to the early majority. Likewise, I feel like momentum around ChatOps has slowed, and it hasn't pushed into Early Majority.

Renato Losio, Cloud architect, remote work enthusiast, and speaker:

  • DataOps / DevOps for Data --> from Innovator to Early adopter
  • Edge Computing / ML at the edge --> almost all now with 5G, now at least early adopter. The latest incidents from Fastly actually suggest that there might be a more extensive use than expected, not only on CDN for Edge Computing. A new trend on Local Zones on cloud providers, too, not sure if better as a separate topic.
  • SRE --> from Early adopter to Early majority
  • Cloud FaaS --> from Early adopter to Early majority
  • Immutable Infrastructure --> Late Majority maybe?
  • Serverless databases (both RDBMS and not) --> Early adopter
  • I see a reference to InfoSec, but I would maybe make it more Cloud-First Security / DevSecOps (Early Adopter)
  • Ledger databases (early adopter?)
  • Cloud infrastructure and services on-premises (Microsoft Azure Stack, AWS Outposts) --> Early adopters.

Jared Ruckle, Cloud Editor @ InfoQ; Product Marketing @HashiCorp:

Changes in the Innovator bucket

  • Edge computing moves to Early Adopter
  • There is a fair amount of traction here from specific vendor tools (Cloudflare Workers) and, more generally, cloud vendors and interest in enterprises (particularly retailers).

Changes in the Early Adopter bucket

  • Site Reliability Engineering --> Early Majority
  • Shift left on security/InfoSec --> Early Majority
  • ChatOps --> Early Majority
  • SRE practices are becoming more popular as the number of critical apps moves to cloud-native; the criticality of these apps is forcing an operational change.
  • It seems like there's always a significant security event to cause folks to be more aggressive with their security approach. Spectre/Meltdown, SolarWinds, ransomware ... so "shift left" is going mainstream from what I have observed.
  • Slack/MS Teams and the pandemic have accelerated ChatOps among engineering teams.

Changes in the Early Majority bucket

  • SDN --> Late Majority
  • Centralized log aggregation --> Late Majority
  • Self-service platforms --> Late Majority

Aditya Kulkarni, Blogger, Reader, and Iteration Manager at Tenerity:

Moving to Early Majority

  • Transformational leadership
  • SRE
  • ChatOps
  • GitOps

Moving to early adopters

  • Software-Defined Delivery

New topics in early innovators

  • Application/software performance engineering in CI/CD
  • Cloud observability
  • FinOps (Cloud Spend Optimization)

Steef-Jan Wiggers, Technical Integration Architect at HSO and Microsoft Azure MVP:

Moving to Early Majority

  • Cloud: FaaS and BaaS - Functions and backend-as-a-service (serverless) become more common in projects given the adoption by developers and investments by its vendors. More languages are supported, and the same accounts bring extensions for observability, such as the recent GA of AWS Lambda extension (including a large ecosystem of partners offering extensions)
  • GitOps - GitHubactions is getting more attention and adoption on cloud platforms.

New topics in early innovators

  • Cloud-Native Hybrid Approaches (e.g., Azure Arc, Google Anthos) - Microsoft, Google, and AWS brought new services for enterprises to support hybrid scenarios. Microsoft offers Azure Arc and brings various other services as Arc-enabled - topics we at InfoQ cover pretty frequently. Similarly, Google offers Anthos for hybrid cloud computing and is also covered by InfoQ. And lastly, AWS has Outposts and recently brought Amazon ECS Anywhere to general availability.

Daniel Byrant, Director of DevRel @ Ambassador Labs | InfoQ News Manager | QCon PC:

Innovators

  • +1 to Lena’s suggestion of Cross-Cloud Uniform Infra Automation. An increasing number of folks I’m chatting to are investigating Crossplane
  • AI/MLOps move to EA
  • Worth-driven architecture/ops -- this can be rebranded to "FinOps", as Holly mentioned
  • Declarative Infra verification -- I think this could be converted into "Policy as Code" and remain Innovator
  • Edge computing move to EA -- this has picked up traction

Early Adopters

  • This section generally looks quite accurate still!
  • GitOps/DiffOps - rename to GitOps and move to EM
  • Add: Team Topologies

Early Majority

  • Container Orchestration - move to LM
  • Remove: Minimalistic OS for containers -- this would be considered quite niche now
  • [Enterprise] DevOps toolchain -- move to LM
  • CI best practices - move to LM
  • Add: Observability

Late Majority

  • Maybe add "Monitoring and logging", as everywhere does this, right (right? :) )?

Helen Beal, DevOps speaker, writer, and strategic advisor:

1)

  • CI best practice into the late majority?
  • Blameless postmortems = healthy retros?
  • Feature flags, blue/green should also mention canary test/release/deploy or be bundled as limited blast radius?
  • Chaos engineering into early majority? - 2021 - yes
  • DiffOps gone away?
  • SRE to early majority? - 2021 yes
  • ChatOps to late majority? - 2021 - yes
  • Continuous testing to early or late majority? - 2021 to late majority
  • DevOps for Data = Data Ops to early adopters?

2)

  • Value Stream Management - early adopters (may replace DevOps dashboards?) - 2021 definitely think this
  • Heritage Reliability Engineering to Innovators?
  • DDD and or HDD to innovators?
  • Additional 2021:
    • I would consider adding teal organizations or replacing transformational leadership with distributed authority/team autonomy
    • Team topologies somewhere in there?
    • AIOps to early adopters
    • Observability

Rupert Field, Delivery Lead @Duffle-bag Consulting and DevOps Editor @InfoQ:

  • Infrastructure as Code: early majority
  • Many people haven't implemented this yet (huge migration task) -> reality is that in greenfield projects, it's very likely, but its tech debt holds back legacy
  • SRE: plausible early majority
    • I'm sure this is on the radar of the early majority - but I doubt being executed well at this level
  • Immutable infrastructure: early majority maybe, not the late majority
    • scope-wise, though ... this really depends whether we are we talking across the business or just one greenfield project
  • Feature flags & blue / green = early majority (no change)
  • Chaos engineering: early adopter
  • ChatOps: early majority possibly
  • Continuous testing: early majority
  • Pipelines as Code -> early majority (can't use GitHub actions without IaC)
  • VSM: early adopter for comprehensive tooling; possibly early majority for more ad hoc solutions
    • over 40% of responders claim they are using VSM tooling in Forrester's latest survey, but only around 8% using one integrated platform, and clearly, the survey is biased towards those who are already interested / engaged

Shaaron A Alvares, Editor at InfoQ for DevOps, Culture & Methods | Agile Coach

Is Continuous Verification contained in "Shift left on Sec/InfoSec"?

Add to Innovators:
●    Sociotechnical architecture
●    DevOps value stream management platforms
●    Developer Velocity (or it can be contained in DevEx?)

Add to Innovators or to Early Adopters:
●    Low Code No Code (LCNC)
●    Hybrid Cloud

Move:
●    AIOps and MLOps move to Early Adopters

Matt Campbell, Lead Editor, DevOps | Engineering Director @ D2L

Observability practices and tooling continue to mature. I think we should be looking at splitting the topic of observability into some newer sub-topics such as OpenTelemetry and service level objectives (SLOs). As noted by others, the logging portion of the three pillars of observability is a relatively well-adopted piece. Monitoring of metrics is probably close to the same level of adoption at this point so I'd say EM for that. The tracing pillar of observability is still a less adopted portion and there are new advancements recently, especially with the more widespread adoption of OpenTelemetry.

SLOs as a tool for communicating outcomes and goals have started to see a resurgence with the recent SLOConf leading the way. I expect to see the various observability platforms start to add SLO-creation and tracking tooling soon. I would put this in late innovator/early EA.

Security continues to be a hot topic as more and more high-profile attacks make the news. While infrastructure as code is probably within EM or later, tooling and techniques to automate IaC vulnerability scanning are not as well utilized (maybe EA). Newer approaches such as Policy as Code (as promoted by Open Policy Agent (OPA)) and remote access management tooling (such as HashiCorp's Boundary) are pushing forward identity as code and privacy as code, so these are probably within the innovator space. These trends are a continuation of the "shift left for security" approach or the newer DevSecOps ideas. With the continued push for early security practices I think we will see more tools and processes introduced in the upcoming year.

The InfoQ editorial team is built by recruiting and training expert practitioners to write news items and articles, and to analyze current and future trends. Apply to become an editor or to contribute articles and get involved with the conversation.

About the Authors

Lena Hall - Director of Engineering: Big Data at Microsoft. She is leading a team and technical strategy for product improvement efforts across Big Data services at Microsoft. Lena is the driver behind engineering initiatives and strategies to advance, facilitate and push forward further acceleration of cloud services. Lena has more than 10 years of experience in solution architecture and software engineering with a focus on distributed cloud programming, real-time system design, highly scalable and performant systems, big data analysis, data science, functional programming, and machine learning. Previously, she was a Senior Software Engineer at Microsoft Research.

Holly Cummins - Innovation Leader, Corporate Strategy SPEED @IBM and spent several years as a consultant in the IBM Garage. As part of the Garage, she delivers technology-enabled innovation to clients across various industries, from banking to catering to retail to NGOs. Holly is an Oracle Java Champion, IBM Q Ambassador, and JavaOne Rock Star. She co-authored Manning's Enterprise OSGi in Action.

Renato Losio - Cloud architect, remote work enthusiast, speaker, and Cloud Editor @InfoQ. Renato has many years of experience as a software engineer, tech lead and cloud services specialist in Italy, UK, Portugal and Germany. He lives in Berlin and works remotely as principal cloud architect for Funambol. Location-based services and relational databases are his main working interests. He is a AWS Data Hero.

Jared Ruckle - Cloud Editor @ InfoQ; Product Marketing @HashiCorp. Jared has over 20 years experience in product marketing and product management. He has worked at numerous IaaS, PaaS, and SaaS companies, including VMware, Pivotal, and CenturyLink. Currently, Jared is director of product marketing at HashiCorp.


Aditya Kulkarni - Blogger, Reader, Iteration Manager at Tenerity, and DevOps Editor @InfoQ. Starting from the developer role, Aditya has evolved into the management domain. Having worked with organization on their journey of agility, Aditya has kept in touch with the technical side of things.

Steef-Jan Wiggers - Technical Integration Architect at HSO, Microsoft Azure MVP and Cloud Lead Editor @InfoQ. His current technical expertise focuses on integration platform implementations, Azure DevOps, and Azure Platform Solution Architectures. Steef-Jan is a board member of the Dutch Azure User Group, a regular speaker at conferences and user groups, writes for InfoQ, and Serverless Notes. Furthermore, Microsoft has recognized him as Microsoft Azure MVP for the past eleven years.

Daniel Byrant - DevRel @ Ambassador Labs | InfoQ News Manager | QCon PC. His current technical expertise focuses on ‘DevOps’ tooling, cloud/container platforms and microservice implementations. Daniel is a leader within the London Java Community (LJC), contributes to several open source projects, writes for well-known technical websites such as InfoQ, O'Reilly, and DZone, and regularly presents at international conferences such as QCon, JavaOne, and Devoxx.

Helen Beal - DevOps speaker, writer, strategic advisor, and InfoQ DevOps Editor. Her focus is on helping organisations optimise the flow from idea to value realisation through behavioural, interaction based and technology improvements.

Rupert Field - Delivery Lead @Duffle-bag Consulting and DevOps Editor @InfoQ. He loves learning about technology and helping people harness it to fix problems. His experience includes defining, designing and delivering technology team strategy. This includes delivering coaching, training, operating model design and business transformation.

Shaaron A Alvares - Editor at InfoQ for DevOps, Culture & Methods | Agile Coach. She is Certified Agile Leadership, Certified Agile Coach from the International Consortium for Agile, and Agile Certified Practitioner, with a global work experience in technology and organizational transformation. She introduced lean agile product and software development practices within various global Fortune 500 companies in Europe, such as BNP-Paribas, NYSE-Euronext, ALCOA Inc. and has led significant lean agile and DevOps practice adoptions and transformations at Amazon.com, Expedia, Microsoft, T-Mobile. Shaaron published her MPhil and PhD theses with the French National Center for Scientific Research (CNRS)

Matt Campbell - Lead Editor, DevOps | Engineering Director @ D2L, an education technology company, responsible for their Infrastructure and Cloud platform teams. His area of focus is DevOps and SRE and implementing these at enterprise scale. He also instructs programming courses with Conestoga College.

 

QConSF promotion

Uncover emerging software trends and practices, without the product pitches.

QCon San Francisco brings together the biggest names in software engineering and architecture for real-world technical talks, practical advice and actionable insights. QCon is the only software development conference that creates unique experiences to help you learn directly from people like you; senior software engineers, software architects, and team leaders.

Level-up on the topics that matter the most right now, solve your complex engineering challenges and get clarity on software decisions, workflows, and roadmaps.


Attend in-person on October 24-28, 2022.

Register now

This article is a summary of the AI, ML, and Data Engineering InfoQ Trends 2022 podcast and highlights the key trends and technologies in the areas of AI, ML, and Data Engineering.

In this annual report, the InfoQ editors discuss the current state of AI, ML, and data engineering and what emerging trends you as a software engineer, architect, or data scientist should watch. We curate our discussions into a technology adoption curve with supporting commentary to help you understand how things are evolving.

In this year’s podcast, InfoQ editorial team was joined by an external panelist Dr. Einat Orr, co-creator of the open source project LakeFS, and a co-founder and CEO at Treeverse, as well as a speaker at the recent QCon London conference.

The following sections in the article summarize some of these trends and where different technologies fall in the technology adoption curve.

The Rise of Natural Language Understanding and Generation

We see Natural Language Understanding (NLU) and Natural Language Generation (NLG) technologies as early adopters. The InfoQ team has published about recent developments in this area including Baidu’s Enhanced Language RepresentatioN with Informative Entities (ERNIE), Meta AI’s SIDE, as well as Tel-Aviv University’s Standardized CompaRison Over Long Language Sequences (SCROLLS). 

We have also published several NLP-related developments such as Google Research team’s Pathways Language Model (PaLM), EleutherAI’s GPT-NeoX-20B, Meta’s Anticipative Video Transformer (AVT), and BigScience Research Workshop’s T0 series of NLP models.

Deep Learning: Moving to Early Majority

Last year, as we saw more companies using deep learning algorithms, we moved deep learning from the innovator to the early adopter category. Since last year, deep learning solutions and technologies have been widely used in organizations, so we are moving it from early adopter to early majority category.

There were several publications on this topic as podcasts (Codeless Deep Learning and Visual Programming), articles (Institutional Incremental Learning based Deep Learning Systems, Loosely Coupled Deep Learning Serving, and Accelerating Deep Learning with Apache Spark and NVIDIA GPUs) as well as news items including BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) from BigScience research workshop, Google AI’s deep learning language model called Minerva and OpenAI’s open-source framework called Video PreTraining (VPT).

Vision Language Models

Interesting developments in image processing related AI models also include DeepMind’s Flamingo, an 80B parameter vision-language model (VLM) that combines separately pre-trained vision and language models and answers users questions about input images and videos. 

Google’s Brain team has announced Imagen, a text-to-image AI model that can generate photorealistic images of a scene given a textual description.

Another interesting technology, digital assistants, is also now in the early majority category.

Streaming Data Analytics: IoT and Real-Time Data Ingestion

Streaming first architectures and streaming data analytics have seen increasing adoption in various companies, especially in the IoT and other real-time data ingestion and processing applications. 

Sid Anand’s presentation on building & operating high-fidelity data streams and Ricardo Ferreira’s talk on building value from data in-motion by transitioning from batch data processing to stream based data processing are excellent examples of how stream based data processing is a must-have in strategic data architectures. Also, Chris Riccomini in his article, The Future of Data Engineering, discussed the important role stream processing plays in the overall data engineering programs.

Chip Huyen spoke at last year’s QCon Plus online conference on Streaming-First Infrastructure for Real-Time ML and highlighted the advantages of a streaming-first infrastructure for real-time and continual machine learning, the benefits of real-time ML, and the challenges of implementing real-time ML.

As a reflection of this trend, streaming data analytics and technologies, such as Spark Streaming have been moved to late majority. Same for Data Lake as a Service which gained further adoption last year with products like Snowflake.

AI/ML Infrastructure: Building for Scale

Highly scalable, resilient, distributed, secure, and performant infrastructure can make or break the AI/ML strategy in an organization. Without a good infrastructure as the foundation, no AI/ML program can be successful in the long term. 

At this year’s GTC conference, NVIDIA announced their next-generation processors for AI computing, the H100 GPU and the Grace CPU Superchip.

Resource Negotiators like YARN and container orchestration technologies like Kubernetes are also now in the late majority category. Kubernetes has become the de facto standard for cloud platforms and multi-cloud computing is gaining attention in deploying applications to the cloud. Technologies like Kubernetes can be the enablers for automating the complete lifecycle of AI/ML data pipelines including the production deployments and post-production support of the models.

We also have a few new entrants in the Innovators category. These include Cloud agnostic computing for AI, Knowledge Graphs, AI pair programmer (like Github Copilot), and Synthetic Data Generation.

Knowledge Graphs continue to leave a large footprint in the enterprise data management landscape with real-world applications for different use cases including data governance.

ML-Powered Coding Assistants: GitHub Copilot

GitHub Copilot, announced last year, is now prime time-ready. Copilot is an AI-powered service that helps developers write new code by analyzing already existing code as well as comments. It helps with the overall developers’ productivity by generating basic functions instead of us writing those functions from scratch. Copilot is the first among many solutions to come out in the future, to help with AI-based pair programming and automate most of the steps in the software development lifecycle.

Nikita Povarov, in the article AI for Software Developers: a Future or a New Reality, wrote about the role of AI developer tools. AI developers may attempt to use algorithms to augment programmers’ work and make them more productive; in the software development context, we’re clearly seeing AI both performing human tasks and augmenting programmers’ work.

Synthetic Data Generation: Protecting User Privacy

On the data engineering side, synthetic data generation is another area that’s been gaining a lot of attention and interest since last year. Synthetic data generation tools help to create safe, synthetic versions of the business data while protecting customer privacy.

Technologies like SageMaker Ground Truth from AWS that users can now create labeled synthetic data with. Ground Truth is a data labeling service that can produce millions of automatically labeled synthetic images.

Data quality is critical for AI/ML applications throughout the lifecycle of those apps. ​​Dr. Einat Orr spoke at QCon London Conference on Data Versioning at Scale and discussed the importance of data quality and versioning of large data sets. Version control of the data allows us to ensure we can reproduce a set of results, better lineage between the input and output data sets of a process or a model, and also provides the relevant information for auditing.

Ismaël Mejía at the same conference talked about how to adopt open source APIs and open standards to more recent data management methodologies around operations, data sharing, and data products that enable us to create and maintain resilient and reliable data architectures.

In another article Building End-to-End Field Level Lineage for Modern Data Systems, authors discuss data lineage as a critical component of the data pipeline root cause and impact analysis workflow. To better understand the relationship between source and destination objects in the data warehouse, data teams can use field-level lineage. Automating lineage creation and abstracting metadata down to the field-level cuts down on the time and resources required to conduct root cause analysis.

Early adopters category also includes new entries. These include Robotics, Virtual Reality, and related technologies (VR/AR/MR/XR) as well as MLOps.

MLOps: Combining ML and DevOps Practices

MLOps has been getting a lot of attention in companies to bring the same discipline and best practices that DevOps offers in the software development space.

Francesca Lazzeri, at her QCon Plus Conference, spoke about MLOps as the most important piece in the enterprise AI puzzle. She discussed how MLOps empowers data scientists and app developers to help bring the machine learning models to production. MLOps enables you to track, version, audit, certify, reuse every asset in your machine learning lifecycle, and provides orchestration services to streamline managing this lifecycle.

MLOps is really about bringing together people, processes, and platforms to automate machine learning-infused software delivery and also provide continuous value to our users.

She also wrote about what you should know before deploying ML applications in production. Key takeaways include using open source technologies for model training, deployment, and fairness and automating the end-to-end ML lifecycle with machine learning pipelines.

Monte Zweben talked about Unified MLOps to bring together core components like Feature Stores and model deployment.

Other key trends discussed in the podcast (LINK) are:

  • In AI/ML applications, the transformer is still the architecture of choice.
  • ML models continue to get bigger, supporting billions of parameters (GPT-3, EleutherAI’s GPT-J and GPT-Neo, Meta’s OPT model).
  • Open source image-text data sets for training things like CLIP or DALL-E are enabling data democratization to give people the power to take advantage of these models and datasets.
  • The future of robotics and virtual reality applications are going to be mostly implemented in the metaverse.
  • AI/ML compute tasks will benefit from the infrastructure and cloud computing innovations like multi-cloud and cloud-agnostic computing.

For more information, check out the 2022 AI, ML, and Data Engineering podcast recording and transcript as well as the AI, ML & Data Engineering content on InfoQ.

 

QConSF promotion

Uncover emerging software trends and practices, without the product pitches.

QCon San Francisco brings together the biggest names in software engineering and architecture for real-world technical talks, practical advice and actionable insights. QCon is the only software development conference that creates unique experiences to help you learn directly from people like you; senior software engineers, software architects, and team leaders.

Level-up on the topics that matter the most right now, solve your complex engineering challenges and get clarity on software decisions, workflows, and roadmaps.


Attend in-person on October 24-28, 2022.

Register now

Adopt the right software development roadmap.

QCon Plus brings together the biggest names in software engineering and architecture for 2-weeks of online real-world technical talks, practical advice and actionable insights. Join from wherever you are and pick a learning pace that works best for you, watch live across 2-weeks or make the most of your on-demand access for 90-days.

Learn how real-world practitioners at early adopter companies are applying emerging patterns and practices to help you solve common problems.


Attend online from November 30.

Register now

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT