Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News QCon San Francisco 2023 Day 1: Architectures, Data Engineering, Infra Languages, Staff+ Skills

QCon San Francisco 2023 Day 1: Architectures, Data Engineering, Infra Languages, Staff+ Skills

Day One of the 17th annual QCon San Francisco conference was held on October 2nd, 2023, at the Hyatt Regency San Francisco in San Francisco, California. This five-day event, consisting of three days of presentations and two days of workshops, is organized by C4Media, a software media company focused on unbiased content and information in the enterprise development community and creators of InfoQ and QCon. It included a keynote address by Suhail Patel and presentations from these four tracks:

  • Architectures You've Always Wondered About
    • Hosted by Wes Reisz, technical principal at Thoughtworks, creator/co-host of The InfoQ Podcast and QCon San Francisco 2023 program committee chair
    • Offers attendees to learn what it takes to operate modern, high-scale systems from the engineers and leaders who build them..
  • Modern Data Engineering & Architectures
    • Hosted by Sid Anand, chief architect at Datazoom, committer/PMC Apache Airflow
    • Offers attendees the chance to learn some fundamental, powerful, yet versatile building blocks and their core engineering principles that developers can leverage to build a simple yet efficient and scalable data architecture.
  • Languages of Infra: Beyond YAML
    • Hosted by Justin Cormack, CTO at Docker
    • Offers attendees the opportunity to explore a variety of tools that move away from YAML with talks from practitioners and those who have brought new tools and processes into creation because they have a strong vision beyond the status quo
  • Staff+ Engineering Skills
    • Hosted by Krys Flores, staff engineer at Carta
    • Offers attendees the chance to learn what it takes to be an effective and successful staff+ engineer

There were also two sponsored-solutions tracks.

Dio Synodinos, president of C4Media, kicked off the Day One activities by welcoming the attendees and discussing human progress through technology, the InfoQ core values, and highlighted that the speakers at QCon conferences reflect those values.

Wesley Reisz reaffirmed the InfoQ core values, discussing how the editorial tracks and track hosts are selected, the concept of the unconference sessions that are facilitated, bottom-up and self-directing discussions among experts; and highlighted the sponsored tracks.

Pia von Beren, QCon product manager and diversity lead at C4Media, introduced the new QCon features, namely attendee lightning talks, the 1:1s; the Women & Allies in Tech Breakfast, and defined the conference breaks where attendees can network as the "Hallway Track."

The aforementioned track leads for Day One introduced themselves and described the presentations in their respective tracks.

Haley Tucker, principal software engineer for platform engineering at Netflix and QCon San Francisco 2023 program committee member, introduced the keynote speaker, Suhail Patel.

Keynote Address: From Mainframes to Microservices - the Journey of Building and Running Software

Suhail Patel, staff engineer at Monzo, presented his keynote address entitled From Mainframes to Microservices - the Journey of Building and Running Software. On his opening slide, which Patel stated was also his conclusion, he asked why the following was true:

Many of our systems are built in the era of commodity computing. Our demands have surpassed the realms of commodity hardware and we're in a world where only a few big players can satisfy our needs.

After showing a behind-the-scenes view of the required microservices for an application in which a Monzo customer uses their debit card, he provided a retrospective of the platforms and software patterns that made both mainframes and microservices so popular, such as: the oldest software system in continuous use by the IRS; the latest IBM mainframe; and warehouse-scale computing, an example of how Amazon implements their Prime Day.

In a more humorous moment, Patel displayed a slide entitled "Only Murders in the Building," which contained an October 2015 tweet by Honest Update:

We replaced our monolith with microservices so that every outage could be more like a murder mystery.

Patel referenced The Tail at Scale, published in February 2013, in which software techniques that tolerate latency variability are vital to building responsive large-scale Web services.

Despite the advances in CPUs and networks, "The free lunch is over," Patel said, referring to a March 2005 technical article by Herb Sutter, software architect at Microsoft and chair of the ISO C++ Standards Committee, that discussed the slowing down of Moore's Law and how the drastic increases in CPU clock speed were coming to an end. Sutter maintained:

No matter how fast processors get, software consistently finds new ways to eat up the extra speed. Make a CPU ten times as fast, and software will usually find ten times as much to do (or, in some cases, will feel at liberty to do it ten times less efficiently).

Patel discussed the massive reduction in cost and complexity of getting large-scale software running on the web and how that trend might not continue forever, especially in the era of specialized offerings like custom data stores that cannot be individually hosted and edge computing.

Patel then introduced solutions to help developers in this area. These include: io_uring, an asynchronous interface to the Linux kernel that can potentially benefit networking; the emergence of programming languages, such as Java, "old language, new tricks," as Patel characterized, due to its recent JDK 21 release; Rust and Zig; and simdjson, a library that uses commonly available SIMD instructions and micro-parallel algorithms to parse JSON for more efficient parsing of JSON.

Patel then showed the CNCF Cloud Native Interactive Landscape, to highlight the number of technologies that are available for cloud-native application development.

The foundations in programming languages, software architecture, virtual machines and containers and even stateful systems have influenced how developers build and run software at scale today.

Highlighted Presentations: Pulumi, Pipelined Relational Query Language, Apache Hudi, Kubernetes without YAML

Pulumi Adventures: How Python Empowered My Infrastructure Beyond YAML was presented by Adora Nwodo, founder at NexaScale, senior software engineer and author of "Cloud Engineering for Beginners." Nwodo was once a full-stack developer until she discovered Pulumi, an open-source Infrastructure as Code (IaC) platform, and how it interacts with languages such as Python to offer a familiar landscape for engineers who are interested in Infrastructure as Code (IaC).

Nwodo maintained that manual configurations "don't cut it anymore" because: deployments are slower; developers rely on documentation; there is a larger risk of errors; and a manual effort is required for rollbacks. Managing resources has become more complex as cloud innovation has rapidly grown over the past few years. As a result, Nwodo switched to Pulumi as an alternative to ARM templates. This greatly impacted her workflow, and she could more easily manage infrastructure while writing code.

IaC can solve these problems because configurations are specified in code, infrastructure deployments can be automated, developers can test, version and rollback, if necessary, and transferable skills from programming can be utilized.

Using the pulumi command, Nwodo demonstrated how to create, build and execute a Pulumi application. Developers can reference a Pulumi example on GitHub.

PRQL: A Simple, Powerful, Pipelined SQL Replacement was presented by Aljaž Mur Eržen, compiler developer at EdgeDB and PRQL maintainer. Before his formal introduction to Pipelined Relational Query Language (PRQL), Eržen presented a brief history of SQL that included acknowledgements to: Edgar F. Codd and his 1970 paper, "A Relational Model of Data for Large Shared Data Banks," which described a new way of structuring data using ideas from set theory; and Donald D. Chamberlin and Raymond F. Boyce who developed SEQUEL, later renamed SQL for Structured Query Language.

Eržen then discussed the flaws of SQL, demonstrating examples of how, despite the human-friendly syntax, the order of providing traditional SQL statements isn't all that natural. Also, while providing an alias in SQL statements, such as SELECT title AS title_alias, name resolution can be confusing because of the rules on when to reference the table or alias name.

In his quest to provide an alternative to SQL, Eržen wanted to design a new language for relations that are more natural, as demonstrated with a simple PRQL example. Requirements for this new design were: read from top to bottom; easy exploration; lazy evaluation; and more easily extract variables and functions. His data model for this new design included: basic data types (int, bool, etc.); tuples as described by tuple calculus; arrays; declarations; functions; transforms; and grouping. A relation can be defined as an array of tuples and transforms can be defined as a function on those relations. PRQL provides a set of 12 transforms that include names familiar to SQL developers, such as: std.from,, std.aggregate, std.join and std.sort.

Eržen mentioned other organizations that have provided an alternative to SQL, namely: EdgeDB with their motto that "we can do better than SQL;" LINQ, a pipelined language for the .NET framework; FunSQL.jl, a Julia library for compositional construction of SQL queries; Malloy, a modern open-source language for analyzing, transforming, and modeling data; and Ecto SQL, an Object-Relational Mapping (ORM) library for Elixir.

Eržen then introduced PRQL, a simple pipelined language that follows the aforementioned design principles, initially released in March 2022. It is fully open-sourced, adheres to the Apache License 2.0, and, as Eržen emphasized, will never be monetized.

The prqlc command is the PRQL compiler that targets SQL databases PostgreSQL, SQLite, DuckDB, MySQL and ClickHouse. Its bindings support C, Python, JS, Java, .NET and PHP. The compilation flow can be described as PRQL → Pipeline Language → Relational Query → SQL. PRQL is written in Rust, and extensions for VScode are available. Developers can learn more about PRQL at this GitHub repository.

Incremental Data Processing with Apache Hudi was presented by Saketh Chintapalli, software engineer at Uber, and Bhavani Sudha Saktheeswaran, distributed systems engineer at Onehouse and Apache Hudi PMC. Saktheeswaran kicked off the presentation with a discussion of the evolution of data infrastructure by comparing on-premise data warehouses (traditional business integration/reporting) and data lakes (search/social). In general, relative to data warehouses, data lakes are open-source and cheaper to scale. She then provided a graphical representation of a typical Lakehouse architecture.

Saktheeswaran then introduced Apache Hudi, a transactional data lake platform, and how THE platform interacts with data streams, databases, cloud storage, meta stores and various analytics tools.

Chintapalli introduced incremental data processing that combines the two modern processing models: batch and stream data processing. After comparing the two data processing models, he concluded that batch processes are slow and inefficient due to: slow batch ingestion; a rewrite of entire tables with overlaps; no smart way to recompute Extract, Transform and Load (ETL); and late-arriving data can be a nightmare.

Chintapalli then provided two case studies. Case 1, Driver/Courier Earnings, in which he demonstrated the challenges with late-arriving data (relative to a 90-day window) and compared a traditional ETL load strategy with an improved incremental ETL load strategy. Case 2, Menu Updates for Uber Eats Merchants, described the challenges with modeled datasets for Uber Eats and the frequency of daily menu data changes.

Saktheeswaran demonstrated how Hudi unlocks incremental data processing for fast-changing data by introducing its various features along with the Hudi Table Types, Hudi Query Types and how it optimizes for large-scale updates.

Hudi 0.14.0 is expected to be released very soon and plans for Hudi 1.0.0 will support non-blocking concurrency control. The Hudi Community consists of: five cloud providers with Hudi pre-installed; a diverse set of PMC and committers; and a rich community of participants.

Kubernetes without YAML presented by David Flanagan, Kubernetes whisperer. Flanagan kicked off his presentation by engaging the audience with a Slido poll containing three questions:

  • What resources do you need to deploy an application to Kubernetes in production?
  • What tools have you used to deploy to Kubernetes?
  • Are you happy with the developer experience of deploying to Kubernetes?

After discussing the results of the poll, Flanangan demonstrated some of the challenges of deploying an application to Kubernetes by playing a small portion of a video from his Klustered Teams series with Red Hat and Talos Systems in which the participants encountered a permission denied error upon executing the kubectl get nodes command and spent a significant amount of time trying to fix it.

Flanagan described the number of resources for a typical Kubernetes deployment, namely: a service; a configuration map; a secret; HPA; PDB; a pod monitor; and network policy, all of which require roughly 120 lines of YAML code. And this didn't include a number of other resources that could be utilized.

Flanagan then listed the attributes and associated tools that developers should require, namely: the Don't Repeat Yourself (DRY) principle; being shareable; being composable; documentation; and being testable.

Tools in the DRY attribute included: YAML Anchors to handle repeated sections in a YAML file; and Kustomize, a template-free native configuration management tool.

Tools in the Shareable attribute included: Helm, a package manager for Kubernetes. Flanagan maintained that the main problem with Helm is the composition of the values.yaml file, but despite this issue, Helm can still be useful.

Tools in the Composable attribute included the aforementioned Kustomize, which is essentially a copy and paste.

Tools in the Testable attribute included: Rego, a tool designed to allow asynchronous workloads to be deployed over Kubernetes with minimal effort; and Common Expression Language (CEL), a tool offered by Google to implement common semantics for expression evaluation for improved interoperability among different applications. Flanagan maintained that it can be difficult to work with Rego.

Flanagan then stated that despite these attributes, the Developer Experience is missing from these attributes. In 2020, he was quoted as saying:

A good developer experience is one where a developer can be successful with intuitive decisions, rather than informed decisions.

There are a number of developer experience tools available, namely, cdk8s; Pulumi; CUE/Timoni; Terraform; and Go. Pulumi is a good tool, Flanagan said, but he doesn't recommend it for Kubernetes because it is based on the Terraform model. As recent as five years ago, most of the developers were primarily using Go, but that has evolved to languages such as Java, Rust and Zig.

Flanagan focused on cdk8s, an open-source software development framework for defining Kubernetes applications and reusable abstractions that support the familiar programming languages Go, JavaScript, TypeScript and Python. cdk8s applications can synthesize into standard Kubernetes manifests, which can be applied to any Kubernetes cluster.

He then provided a live coding demo using TypeScript to create a deployment along with options for continuous improvement within the source code.

Flanagan concluded his presentation with best practices that included: build internal pattern libraries, i.e., stop reinventing the wheel; share publicly with other developers; policies and tests; and hook into the existing tools as necessary.

About the Author

Rate this Article