InfoQ Homepage Articles Using Serverless WebSockets to Enable Real-Time Messaging

Cloud

Using Serverless WebSockets to Enable Real-Time Messaging

Dec 01, 2022 11 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Real-time data delivery is behind the new live and collaborative features that differentiate competitive products from static user experiences.
Standardized over a decade ago, WebSocket technology has since matured into one of the key technologies powering the modern, real-time web to provide live and collaborative user experiences.
As organizations adopt event-driven architecture to support real-time data, they frequently look to build a solution. On the face of it, building and maintaining infrastructure to serve real-time data over WebSocket connections may not sound that hard; the complexity comes from running a global service at scale.
Adopting a serverless WebSocket solution from a cloud provider can help avoid the common issues of building a homegrown solution for internet scale, such as the cost of ownership, reliability, availability, and performance.
Real-time messaging platforms extend the serverless WebSocket solution to organizations that prefer to focus on their products’ core features rather than on building a real-time infrastructure.

The rise of real-time data delivery

Our everyday digital experiences are in the midst of a revolution: real-time data.

Users now expect a page within an app or browser to update parts of itself without needing to refresh it—for example, an app that shows live sports scores or a web page that tracks an expected delivery on a map.

We have all become accustomed to immediate digital experiences, and we take it for granted that our apps and web pages offer smooth interactive services without lag. Organizations that include seamless live updates to engage their users can reap the rewards of higher levels of engagement and more time-on-page, with potential repeat visits and business. Without seamless live updates, a competitor’s experience appears dated and risks losing market share.

This article reviews some of the most common live-user experiences with examples, discusses event-driven architectures to support real-time updates, and introduces common technology choices.

Estimates suggest that 30% of all global data consumed by 2025 will result from information exchange in real time, as 150 billion devices are predicted to be connected and create real-time data. As companies look for operational advantages over their competitors, they are turning to live experiences. In a recent survey, IT leaders told IDC that investing in technology to achieve real-time decision-making is a top priority.

What is real time?

The average human blink time is 100ms, and the average reaction time is approximately 250ms. So anything that happens in <250ms is perceived as happening in “real time” or “live.”

Operating systems have different requirements, and a “hard real-time” kernel promises a latency of less than 5ms. Another area where low latency is expected is hardcore multiplayer gaming, which needs sub-50ms. In this article, however, we are talking about the more general real time that matches the human interpretation of “instantaneous.”

Common real time user experiences

Live experiences

Many people will be familiar with live experiences where real-time data is delivered in one direction, from source to user, in response to a change in content or data.

Good examples include:

Online shopping sites that show the number of items remaining in stock or the number of other users who have them in their basket
Immediate updates within a banking app to reflect account activity
Sports and news updates delivered to an app or web page.

Live experience example: Reddit

Reddit is one of the world’s 20 most visited websites. Its recent introduction of real-time updates indicates that they are strategically important in the competition with other online communities, such as Facebook. Reddit app users can now see other Redditors’ activity as it happens, such as animations and indicators when Redditors are reading the same post or typing a response, and other live features.

Shared live experiences

Shared live experiences take place when real-time data is exchanged bidirectionally. A typical example includes typed chat messaging. Other uses include polls, quizzes, and Q&A sessions, for example, on Twitch streams, as participants interact with their live-stream host.

Shared live experience example: Mentimeter

Mentimeter can be used to complement a live stream and add polls or quizzes. For example, the audience could be asked, “What’s your favorite Mandalorian character?” The participants follow a short URL to vote online, and their responses are visualized in real time and shown on the live stream.

Collaborative experiences

Collaborative experiences are similar to shared live experiences but typically use peer-to-peer communication to allow users to edit shared states and data in real time. A typical example is the collaborative productivity tools we are familiar with from remote working, such as Figma, Google Docs, or Miro.

Collaborative experience example: Figma

Figma is a collaborative design tool for multiple users to work on a file simultaneously. It’s said to be a “watershed moment in design collaboration” and a game-changer in the uptake of design software.

Users can share a link to a Figma file, so there is no need to download or send files, nor edit offline and merge changes. This is now relatively standard for collaboration on documents, spreadsheets, and presentations, but it’s a new way of working with complex design software.

Figma, which Adobe intends to acquire for a record-breaking $20Bn, makes an interesting case study for many organizations, whether they are incumbents or challengers to them. The future is in shared live and collaborative experiences, and products without real-time features need to find a way to introduce them.

Usually, when we talk about real time, we think about speed and latency, but the essence of a real-time interaction lies with the architecture.

Event-driven architecture for real-time solutions

For a successful real-time solution, you need to consider an event-driven approach for asynchronous communication. In an event-driven model, an event indicates a change that may trigger interested clients (or event consumers) to do something, such as update the UI. Events are passed to consumers as messages via an event channel. The event producer is passing on events that reflect real-time changes to data or state.

A typical pattern in event-driven systems is “publish and subscribe.” When a state change occurs, an event producer (publisher) sends event messages, which event subscribers can consume and invoke their business logic in response.

Protocols for real-time updates

In an event-driven architecture, the event consumer needs to be able to receive updates asynchronously. Protocols to choose from include:

HTTP long polling: The server holds a client connection open to deliver a response when new data becomes available or the connection timeout threshold is reached.
WebSocket: Provides two-way, full-duplex communication channels over a persistent TCP connection, with much lower overhead than half-duplex alternatives such as HTTP long polling.
MQTT: The go-to protocol for streaming data between devices with limited CPU power and/or battery life, such as IoT devices.
SSE: An open, lightweight, subscribe-only protocol for event-driven data streams.

WebSocket is arguably the most widely used protocol to power live user experiences. Standardized in 2011 by RFC 6455, WebSocket is a thin transport layer built on top of a device’s TCP/IP stack.

The emergence of WebSocket marked a turning point for web development. Designed for an event-driven architecture and optimized for minimum overhead and low latency, a WebSocket connection enables bidirectional, full-duplex communication between client and server over a persistent, single-socket connection. The intent is to provide what is essentially an as-close-to-raw-as-possible TCP communication layer.

Compared to REST, WebSocket connections increase efficiency; they scale better, and the WebSocket protocol is push-based so that connected clients receive updates as soon as events occur.

WebSockets for event-driven systems

There are several paths to integrate WebSocket capabilities into a tech stack.

The first option is to build a WebSocket-based messaging solution from scratch and tailor it according to preference. For example, DAZN used the WebSocket protocol to engineer a custom solution for broadcasting messages to millions of users.

Another option is to use open-source technologies as the backbone of a WebSocket-based messaging layer. Socket.IO is a framework that provides capabilities on top of raw WebSockets, such as fallback support, automatic reconnections, and pub/sub messaging (rooms). A common approach is to combine Socket.IO with Redis Pub/Sub to run multiple Socket.IO instances in different processes or servers and pass events between nodes.

However, there will still be some limitations to overcome, such as a lack of message ordering, limited native security, and a single-region design, which make it challenging to use Socket.IO for a production-ready system at scale.

Challenges to building a WebSocket solution for real-time data

Both of the approaches to the integration of WebSockets that are described above have associated engineering challenges, which can impact project costs and delivery deadlines. At the most simplistic level, building a WebSocket solution looks like the way forward for adding the capability to receive real-time updates. However, feature creep often means that a basic live experience seeds additional requirements for shared live experiences and collaborative features.

Building and maintaining a proprietary WebSocket solution to support the real-time needs of these experiences can be challenging. The infrastructure that underpins the solution must be stable and dependable and requires experienced engineers to build and maintain it. A development team may find they are focused more on the real-time aspects than the features that augment the core product and face engineering challenges for scalability and elasticity, latency, fault tolerance, and data integrity and connection management.

Scale and elasticity

Scaling a homegrown solution to handle millions of concurrent WebSocket connections dependably is a complex and time-consuming undertaking that requires dedicated engineering resources, significant infrastructure costs, and time.

Horizontal scaling comes with a more complex architecture, load balancing, routing, and increased infrastructure and maintenance costs, to name just a few challenges.

To successfully handle WebSocket connections at an unpredictable scale, there is also a requirement for elasticity to add more servers automatically so the system has sufficient capacity to deal with potential usage spikes.

Latency

Network latency is a critical factor in large-scale distributed systems. Latency deteriorates with distance, so to keep network latency low, it’s advisable to keep data as close to the users as possible via managed data centers and edge acceleration points. A good user experience also needs any variance in latency to be minimized.

Fault tolerance

To make a system fault-tolerant, it must be redundant against instance failure and even data center failure. This implies distributing the infrastructure across multiple availability zones in the same region at the very least, and potentially across multiple regions. This challenge involves significant engineering and DevOps efforts and infrastructure-related costs.

Data integrity and connection management

An event-driven architecture relies on an exact sequence of event messages where none are lost or misordered.

A user’s connection can drop if their power fails or a system problem occurs on the network. When the user reconnects, events need to be available from the point they disconnected. Missed messages need to be delivered without duplicating those already processed. The whole experience needs to be completely seamless.

There are some genuinely complex engineering problems to solve to guarantee the data integrity needed for ordering and exactly-once semantics.

The do-it-yourself dilemma

Some organizations try to ship early, leaving these difficult issues for later. But getting to market quickly and capturing early success might be self-defeating if the product cannot satisfy the demand it creates.

An alternative is to design for scale early and have a sustainable architecture for future growth. But such an approach can lead to delays in reaching the market, which competitors can seize upon. An additional common issue is that the original design embeds significant constraints into the product before it’s had sufficient market feedback to know the direction it will evolve.

The advantage of serverless WebSockets

A practical solution is to offload the complexity of building a business-critical real-time platform to a specialized cloud service. A fully managed serverless WebSocket solution offers the infrastructure for event-driven messaging; it renders the underlying infrastructure a commodity. Clients use the provider service to send/receive low-latency messages and focus on building business logic to handle real-time updates.

There are several benefits from combining WebSocket technology with a serverless model:

No infrastructure to maintain: Building proprietary WebSocket infrastructure is time-consuming and resource-heavy. However, a serverless WebSocket provider offloads the burden of managing demanding real-time infrastructure.
Reduced operational costs: Most serverless WebSocket providers offer a pay-for-use pricing model. This is more cost effective than renting or purchasing a fixed amount of server capacity in advance, which generally involves significant periods of underuse or idle time.
Scalability and availability: Serverless WebSocket architectures are scalable by design. Apps built with a serverless infrastructure may experience high and fluctuating demand and need infrastructure that automatically scales up and down to handle an unpredictable and quickly changing number of concurrent WebSocket connections.
Reduced latency: The application is not hosted on an origin server in the serverless model. This means that, depending on the serverless WebSocket infrastructure provider, serverless applications run closer to end users in multiple regions and edge locations worldwide, which improves performance and reduces latency.

Serverless WebSocket solutions

Cloud vendors offer serverless WebSocket solutions such as AWS AppSync and AWS API Gateway, Cloudflare Workers, Google Cloud Run, and Azure Web PubSub. However, these solutions do not provide a complete end-to-end solution to handle the most common scenarios, as exemplified by DAZN’s review of AWS AppSync and AWS API Gateway.

Platforms like Ably and Pusher are positioned to help organizations solve the challenges of an event-driven architecture with serverless WebSockets but add extra features to solve common pain points.

For example, there is rarely a one-size-fits-all protocol, as different protocols serve some purposes better than others. Ably offers multiple protocols such as WebSocket, MQTT, SSE, and raw HTTP, and also extends beyond just the raw protocols to add features such as device presence, stream history, channel rewind, and handling for abrupt disconnections.

There are multiple benefits to adopting a serverless WebSocket platform from a vendor that offers an end-to-end solution since they can handle the challenges of high-scale real-time data distribution while engineering teams focus on core product innovation without having to provision and maintain real-time infrastructure.

Summary

Live experiences are underpinned by real-time, event-driven APIs to meet the demands of modern end-users.

A real-time system is predicated upon consistently low latencies, data integrity (ordering and guaranteed delivery), fault tolerance, availability, and scalability. Not every organization is geared to handle the complexity of building dependable and uninterrupted experiences.

A new breed of PaaS that offers serverless WebSockets can facilitate the process of architecting, building, delivering, and maintaining a solution that satisfies users and keeps a product competitive without the high cost of ownership of a homegrown solution.

About the Author

Matthew O’Riordan

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?