BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Mitigating Geopolitical Risks with Local-First Software and atproto

Mitigating Geopolitical Risks with Local-First Software and atproto

50:16

Summary

Martin Kleppmann discusses the urgent need for technological sovereignty in modern infrastructure. Exploring the shifting landscape of global tech dependencies, he shares how engineering leaders can leverage multi-cloud architecture, de facto API standardization, the AT Protocol, and local-first development paradigms to reclaim user agency and build highly resilient systems.

Bio

Martin Kleppmann is an Associate Professor at the University of Cambridge, working on decentralised systems and cryptographic protocols. He is the author of the best-selling O’Reilly book "Designing Data-Intensive Applications".

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Martin Kleppmann: The world has changed and unfortunately not for the better. You might think I'm talking about AI. I'm not talking about AI. I'm talking about Greenland. This is a picture of the city of Nuuk in Greenland. This is a graph showing the market share of various cloud providers in the European market. You can see that although these blue bars, that is the market size has grown massively over the last decade, the market share of European-based cloud providers has actually gone down and is hovering around about 15%. In contrast, AWS, Azure, and Google Cloud together have about 70% market share of the European market. The largest European cloud providers have about 2% market share each. In other words, Europe is completely dependent on U.S. cloud services. What has this got to do with Greenland? You might remember that in the ancient past of two months ago, U.S. president, Donald Trump, made some strong statements about wanting Greenland to be part of the U.S.

Of course, fortunately now those particular tensions have died down and now we're worrying about Iran instead. There's no knowing if these tensions might come back again, and they could potentially escalate. What then? Because if those tensions were to escalate, it's not too hard to imagine there being some trade sanctions and potentially Europe could find itself suddenly with very little warning locked out of U.S. cloud services. What then? In case you think I'm just being hysterical, so here's The Economist, which I would argue is a fairly reputable publication, and they wrote the following. "A year ago, it would have been absurd to worry about European access to American technology. As Donald Trump recklessly exploits the transatlantic alliance over Greenland, an executive order limiting American AI firms' business in Europe no longer seems unthinkable. Some European restrictions on American technology, including the computing clouds where AIs reside, are also plausible".

This is about AI, but you can substitute the word cloud for there and it would be equally true. If you still don't believe me, it's happened before, albeit at a much smaller scale. Last year, Trump imposed sanctions on the International Criminal Court in The Hague because they had issued an arrest warrant for the Israeli Prime Minister for alleged war crimes in Gaza. I'm not going to comment on the politics of this. You're welcome to have your own opinions on who is right here. The point is that very shortly after these sanctions were imposed, the chief prosecutor of the International Criminal Court, Karim Khan, suddenly found himself without access to his email account, which was hosted by Microsoft, and his bank accounts had also been frozen. AP reported that, "Microsoft had cancelled Khan's email address".

Microsoft hotly denies that it actually suspended services to the ICC, but Microsoft did say that they had been in contact with the ICC, "throughout the process that resulted in the disconnection of its sanctioned official from Microsoft services". They say they didn't suspend services, but they said they disconnected, and they refused to clarify what the word disconnected actually means in this context. I don't know what to make of any of that. You're, again, welcome to make up your own mind. In any case, the ICC then very quickly migrated off Microsoft services in the following months. It does seem like there is a risk of getting locked out of U.S. cloud services if the U.S. government doesn't like you.

Now you might think, but all the big cloud providers have regions in Europe, so why don't we just deploy all of our stuff to a European region, and then we should be fine. Unfortunately, it seems like this is not the case either. Microsoft has said that if the U.S. were to issue a legal request to Microsoft for the data of an EU citizen hosted in the EU, then Microsoft would comply regardless of what EU law says. In other words, U.S. law takes precedence, presumably because Microsoft is headquartered in the U.S., and whether the data is physically stored in the EU seems to make no difference. It also seems to make no difference whether the target is an individual or a company or a government. This particular statement here applies to data access, but it also seems likely that the same would apply if the U.S. ordered a cloud provider to suspend services to a European customer.

Then, in addition to tariffs and sanctions, another way how you could get locked out of cloud services is if somebody blows up the data center. This is also not hypothetical, because Iran attacked three AWS data centers in the United Arab Emirates with drones and severely damaged at least one of them. Last time I checked, the services in that region were still severely disrupted. This is the AWS dashboard. Digital infrastructure is now so important to our society that it is becoming a military target. Also, data centers are just simply some of the most expensive buildings in the world right now. If you're a military wanting to cause economic damage, then bombing a data center is actually not a bad thing that you could do. This makes it an attractive target. In case you think it's only software, keep in mind that software is eating the world.

As Marc Andreessen famously wrote, everything is software nowadays, including tractors. In 2022, there was this funny story where Russian troops in Ukraine stole a bunch of agricultural machinery from a John Deere dealership there, but the manufacturer, John Deere, just disabled it remotely. Then the Russians had a bunch of useless tractors that wouldn't drive. In this case, it seems like the remote access was perhaps put to good use, again, depending on your politics. You can very easily imagine a scenario of this kind of thing going very horribly wrong.

This kind of technological or economic dependency can be exploited. For example, if you cast your minds back to when Russia invaded Ukraine in 2022, Europe put up sanctions against Russia. Europe was also in a difficult position because some countries, such as Germany, were highly dependent on Russian gas imports. Germany had to very quickly build a bunch of infrastructure in order to make up for the closure of the pipelines. The country got lucky because the following winter was fairly mild, and so the gas demand was lower than expected. We could have very easily ended up in a situation where the German population was freezing because they were unable to heat their homes in the winter. Cloud services are now similarly critical infrastructure as gas supply. Our society would simply stop functioning if the cloud was suddenly turned off.

I really don't want to be alarmist here. I do think the actual probability of a major conflict between the U.S. and Europe is still very small. I really obviously hope that it doesn't happen. None of us wants a conflict to happen. We're all friends after all. I have plenty of friends in the U.S. I'm sure you do too. Nobody wants this thing to go wrong. Unfortunately, a year ago, it would have been unimaginable to have a conflict. Now the probability is not zero. It's non-zero. I don't know what the probability is. If things were to go pear-shaped, the impact of Europe getting locked out of U.S. cloud services would be absolutely huge. Therefore, it's a risk that we have to take seriously. We don't need to panic. We should never panic. We do need to start thinking actively about this risk and take some measured steps towards mitigating it. A lot of this work happens under the banner of technological sovereignty, which is a nice buzzword. It doesn't tell us how to actually achieve that. That's what I want to look at in the rest of the talk.

Outline

I want to talk about three areas of technology that can support technological sovereignty. Those are multi-cloud support, which applies to backend services that you deploy yourself. Then I want to talk about two technologies that I've been involved in designing. Firstly, the AT Protocol, which is the protocol underlying Bluesky, a social network. That's an example of applying this sovereignty idea to social media. Then the third thing I want to talk about is local-first software, which is a way of building collaboration software.

Backend Services You Deploy Yourself - Multi-Cloud

Let's start with backend services that you deploy yourself and multi-cloud. Generally, this stuff comes under the banner of decentralization. With decentralization, the goal is generally to avoid dependency on any one single thing. In the first instance, it would be avoiding dependency on a single server. Actually, going further, when we think about this in geopolitical terms, we want to avoid dependence on any single company and potentially even avoid dependence on any single country. The way we avoid dependency on a single server, that's classic distributed system stuff. We've been doing that for decades. We have these techniques such as replication, which means keeping a copy of the data on multiple servers, which then allows us to implement fault tolerance and avoid single points of failure. The whole thing is a single point of failure is a single machine that if you turned it off, then the system as a whole would stop working.

Nowadays, we've actually got really good at building systems that don't have single points of failure, where you can just take out individual machines and the users don't even notice that anything is wrong. If all of the servers are hosted by the same company, then you've still got this risk of, what if the company, the provider, locks you out? In order to mitigate that risk, we need to make it easy to be able to switch providers, switch from one provider to another, or even better, actually use multiple providers side by side. That's what multi-cloud is essentially about. If we want to go even further and say we want to avoid dependency on any one country, now this is actually pretty similar to avoiding dependency on one company, just using providers in different countries as well.

If you look at this and just zoom out, it really seems the critical idea here is this ability to be able to switch providers. Let's dig into that a bit. How do you make it easy to switch providers? I would argue that the best way of making it easy to switch providers is commoditization, which is a term from economics saying essentially that rather than having differentiated products from a bunch of different suppliers, we standardize those products in such a way that you can take a product from one supplier and swap it for a product from a different supplier, and they should be more or less interchangeable. We know this, say, from agricultural commodities, where coffee or cocoa, unless you're at the very high end where it's like single origin cocoa beans or coffee beans, for most of the coffee and cocoa being traded on the world market, it's a commodity.

You can take coffee from one farm and interchange it with coffee from another farm, and the consumer wouldn't notice. Of course, this applies to other types of products as well, manufactured products like steel and iron and so on. Those are all commodities. How do you achieve commoditization? It's through standardization. You need standardization so that the different suppliers are providing products that are compatible with each other. Obviously, this applies to computing services, but it's a very old idea. We've had standardization since the Industrial Revolution, since Joseph Whitworth in 1841 proposed a standard for screw thread sizes, which at the time was a big deal because there were lots of different companies manufacturing screws, and previously they used different thread sizes, which meant you couldn't use a screw from one manufacturer together with a bolt from another manufacturer. If you had to change manufacturers, you had to retool your whole production process. Just standardizing screws made a huge difference in terms of the ability to produce machinery during the Industrial Revolution.

What does this mean in the context of cloud services? In the context of cloud services, there are not very many really established standards in the sense of a standards body has decided on them, but there are quite a lot of emerging de facto standards. De facto means it's not actually a standard in the sense of a standards document, but it's a standard in the sense that one company designed some API and then a bunch of others started imitating it, and if they imitate each other well enough, then it means you can switch from one provider to another. In the context of object stores, for example, it seems that apart from the big three cloud providers, many other cloud vendors now provide object stores as well. They seem to have standardized around using the API of Amazon's S3 as the API for other object stores as well.

This means that hopefully if you have an app that is written using the AWS S3 API, and you want to move it to a different provider, it should hopefully work as well. We have this kind of trend for other forms of computation as well. When it comes to deploying services, it seems like Kubernetes and Linux containers are becoming essentially a de facto standard, as it's something that if you write your app in a way that it runs on Kubernetes, you can probably run it on just about any provider. You can also take this further, and, for example, looking at streaming data, it seems that the Kafka API is becoming a de facto standard there. Obviously, Kafka itself implements the Kafka API, but there are also a bunch of competing streaming systems that have also adopted the same Kafka API while providing a different internal implementation.

For relational databases, it seems like the Postgres wire protocol and the Postgres client library are becoming a de facto standard, where a bunch of other databases that are not actually Postgres are also providing Postgres compatible clients. Then, for analytics purposes, finally, it seems like a datalake house architecture is becoming a de facto standard. What this means essentially is encode your data as a bunch of Parquet files, dump them in an object store, manage some metadata around that using something like Apache Iceberg, and then run an open-source query engine in order to actually query that data. All of these things are not written standards, but they are emerging de facto standards that ease the move from one provider to another.

Adopting these kinds of technologies and enabling a multi-cloud architecture does have some big advantages. One is that you're simply reducing the vendor lock-in. If you're in your pricing negotiations with one cloud provider, then you're in a much stronger situation if you can say, I could very easily move my stuff to this competing cloud provider. Why don't you offer me a lower price? Also, it avoids putting all of your eggs in one basket, which is important in the geopolitical context that we started off with. As always with every technical decision, there's always downsides, there's always tradeoffs. One big downside of taking a multi-cloud approach is the increased cost, because it means that you're probably storing copies of the same data on multiple providers, unless you're just choosing the option where you want to make it easy to migrate without actually running multiple providers side by side.

You're increasing the storage costs there. You're potentially incurring additional egress bandwidth fees to shift the data between the various providers. There's operational complexity involved in simply managing more moving parts. Another big downside of adopting this kind of approach is that you are stuck with the lowest common denominator features of all of the providers that you've chosen to use, because in the end, although various other object stores provide S3 compatible APIs, for example, although they call themselves S3 compatible, they don't actually implement exactly the same feature set. In fact, many of the more advanced features that S3 offers are not actually offered by some of the competing object stores.

That means you're limited to using the basic feature set, like GET and PUT, the basic operations there will be fine, but you then can't use the more advanced features which some of the best-in-breed technologies might offer, and that is a serious tradeoff to make there. Finally, it means that if you want to be able to move between cloud providers, and some of those cloud providers don't provide all of the services as managed hosted services that you can just use, it might mean that you have to self-host more of them. You might have to actually install the software on VMs yourself and operate it yourself.

Is self-hosting really a problem? It depends really on your perspective and the skill set you have. I would argue that if you're a company that already has an operations team anyway, that is already running a bunch of services, then running a couple of additional services is probably not that much of a big deal. You might have to do a bit of learning to gain more operational experience with a new tool, but it's not really the end of the world. I want to keep in mind that you are somewhat unusual in that you are in technically sophisticated companies, most likely, that have this operational capability. Most of the rest of the world is not in that situation. Most individuals do not have the tech skills to run a Linux system and set it up in such a way that it will be reliable. Most small businesses do not really want to have a 24/7 on-call team to manage their services. They would much rather outsource that and not have an in-house team on PagerDuty.

Then, of course, there's non-tech businesses as well, and we can't expect everyone to know about maintaining these technical systems. It means if you were a company that doesn't already have technological expertise, then actually finding and hiring staff with that expertise is actually quite a significant challenge. From a personal point of view some people might be willing to, if you want to run a personal service, to then maintain it on the weekends. Personally, if I have spare time at the weekend, I'd much rather spend it with my family, or go to the pub, or something like that. In my opinion, life is too short to be a sysadmin unless you really like that, or unless it's your job to be on an operations team, in which case then, of course, you're doing this professionally and it's a very different matter.

This is just my personal opinions on this topic of self-hosting. Self-hosting does give you these advantages of not being bound to a single vendor and having this flexibility to move between services. As such, I would say it is better than just entirely relying on some proprietary service. Really, I think what we should be aiming towards is commoditized services, which are still way better than expecting people to self-host stuff. If you can just buy equivalent services from multiple different providers and you just pick the one that's cheapest or the one that you like best, then that just frees most people from doing operations and leaves a small number of service providers to specialize in these operational activities. That's all I wanted to say on multi-cloud.

Social Media - AT Protocol

Let's move on to the next topic. I wanted to talk a little bit about social media. How important or critical social media is might depend on your viewpoint. I personally have come to the belief that actually it's quite an important part of a civic society nowadays to have mechanisms through which politicians and journalists and so on can spread their ideas and communicate with each other. I want to talk a little bit about the AT Protocol, Authenticated Transfer, which is the protocol underlying Bluesky, a social network. Anyone here use Bluesky by any chance? It's a Twitter clone. In terms of the user interface and user interaction, it's very much familiar stuff. I'll tell you a little bit about the background story of how it came to be. Initially Bluesky was actually started as an initiative by Twitter, while it was still called Twitter, in 2019 when Jack Dorsey announced it.

It then took quite a while for it to actually materialize, and it was only in August '21 that finally the project was funded by Twitter and Jay Graber was chosen to lead it. Jay very wisely argued that Bluesky should be an independent organization, an independent company, not part of Twitter, but funded by Twitter initially. After the funding was in place, then Jay set out to actually hire a team. She hired an absolutely amazing engineering team. Around early '22, then the implementation started along with protocol design work. At this point, the goal was actually to develop an open standard protocol for decentralized social media. The goal was not to develop an app for end users. It was actually focusing on the backend protocol. The idea was that Twitter would become just one client on top of this protocol and other companies could provide alternative clients on top of the same protocol.

This early '22 is also when I got involved. Jay reached out to me and asked me whether I wanted to be involved as an advisor. I thought, this sounds really interesting. Since then, I've been involved in a very small capacity, really just speaking to the team for an hour a week or so. I've not written a single line of code in Bluesky, but I like to think that maybe I've had a little bit of influence on the design of the protocols and the underlying systems. If you remember, something else also happened in 2022, which was that Elon Musk then, very shortly after the team was hired, made a bid to acquire Twitter. That acquisition then closed in October of '22. That really changed things for Bluesky, because as I said, originally the design was for Twitter to be a client of this protocol, but it very clearly became clear that Musk was not interested in this decentralized protocol stuff.

At that point, then the Bluesky team decided to pivot and decided to actually build an app for end users themselves and compete directly with Twitter/X. Then by February '23, that app existed, was implemented, and a private beta was launched with then gradually increasing user numbers. A year later, it was launched to the public. At the public launch just before, there were about 3 million users in the private beta, which then very rapidly increased to 5 million users post-launch. Today we're at about 43 million users. It's still a lot smaller than X, but it is quite a significant social network now. I use it as essentially my exclusive social media thing now. At least for me, it works very well. It seems to be working very well for many other people too.

The really interesting thing I want to talk about here is not actually Bluesky the product, but AT Protocol, the underlying technology. Let's talk about that a bit. atproto is an open and decentralized protocol for social media. It's intended for public broadcast social media, so it doesn't currently have a way of privately sharing anything. There are mechanisms being developed to add that in the future, but right now it's public only. Another goal from the outside with atproto was that from a user's point of view, using it should be identical to using a centralized service. In particular, many efforts to build decentralized social media have had effects where there's stuff that is just a bit weird, or behaves strangely, or which is more complicated as a result of the decentralized architecture. We were very adamant from the start that we wanted none of that additional complexity in atproto.

From the view of users, it should behave just like a familiar centralized app like X, or Instagram, or any of those. Another thing I should say about atproto is that it's not just focused on the microblogging, Twitter-style form of social media, but it's really for any type of broadcast social media. Bluesky is just one mode that's been built on top of atproto, and other teams have been building other social modes on top of the same infrastructure. There's now blogging platforms built on top of atproto. There's social coding platforms in the style of GitHub. There's video and photo sharing. There's events management. There's tools for researchers. There's book reviews. There's all sorts of stuff that has all been built on top of atproto. Another core design decision behind atproto was this idea of credible exit, which means that if a provider of some of this system was to go bad, it must be possible for users to switch to a competing provider without losing anything.

In particular they should not lose their username. They will not have to change their username as a result of switching to a different provider. They should not lose their social graph, who they're following and who they're being followed by. They should not lose any of their posts. They should not lose any of the replies that other people have made to their posts, and so on. Everything should stay there and be very easily migratable from one provider to another. Moreover, it should be easy and somewhat cheap to set up a new provider. Those are the goals of atproto.

Let's have a look at how it's actually implemented. I'll just walk you through the data flow essentially that happens. What we typically start with is we start from a user ID, and there's a user database that you can query by user ID. The user ID might actually come from the DNS. I won't go into that in more detail. If you want technical details of how all of this works, here's the URL, arxiv.org/abs/2402.03239, of a paper that describes the architecture of atproto in more detail. If you query this user database, you get back a bunch of information. You get the public key of that user, because it uses cryptographic keys. You also get back the URL of a server on which the data for that particular user is hosted. This server is called a Personal Data Server, or PDS.

The way that atproto stores the data is that each user has a repository. You can think of this like a git repository. Each user has their own repository, and any actions that a user performs turn into writes in that user's own repository. If you post something, that goes in your own repository. If you like somebody else's post, that also goes in your own repository. If you follow somebody, that goes in your repository. If you reply to somebody else's post, that goes in your own repository. Essentially, each repository is a collection of all of the actions taken by a particular user. These repositories can be hosted by anybody. The PDSs can be hosted by anyone. If you just sign up as a Bluesky user, then you will get a repository on one of the PDSs hosted by Bluesky, but you can migrate that to a different provider, or just sign up with a different provider as well.

There are a bunch of alternative providers now as well, such as Blacksky, Eurosky, a bunch of people are self-hosting PDSs as well for the enthusiasts who like self-hosting. All of those options are there. I told you that each user's writes go only into their own repository. If you want to know all of the replies to a particular post, that means you have to scan over potentially all of the repositories, because any of the repositories could contain a reply to this particular post. In order to aggregate across all of those repositories, atproto then has a component called the relay. This relay collects all of the events that are written to any of the repositories in the network and puts them together into one big firehose. You can then sit as a consumer of this firehose and process all of those events in real time.

Bluesky then runs something that we call the AppView, which is essentially a big service database that collects all of the events happening in any of the repositories and indexes them in whatever way is necessary in order to provide the functionality. For example, it then aggregates all of the replies on a particular post. It counts the number of likes on each post. It aggregates the social graphs so that you can figure out who's being followed by whom and who's following whom, and so on. All of that information is just collected in this secondary database. The important thing is that this AppView database is entirely derived from the data that lives in the PDSs. If somebody wanted to create an alternative AppView, they could do so because all of the data is there, public, in the PDSs, ready for anybody to download. You can build a new AppView.

One option is to subscribe to the relay that's hosted by Bluesky. This is what a lot of the non-microblogging social modes on top of atproto do. They will rely on the PDSs and a relay hosted by Bluesky. The users can just post records for their book recommendations into the same repositories as they use for their Bluesky posts. It's just a general-purpose collection of records. Then they flow through the relay, and then other AppViews for the book recommendation site, for example, can just subscribe to the firehose of events there. The relay also can be hosted by anybody because anyone can go and download the data from any of the PDSs. For example, Blacksky is now hosting an alternative relay that aggregates data across all of the same PDSs that the Bluesky relay does. Blacksky can then have their own AppView that's derived from the data in their own relay.

You can see here that we've got all of these components hosted by different providers, and they're all somewhat interchangeable. This is really moving towards this idea of commoditization that I was talking about earlier, except for that user database that I started with, which that part still looks like it's somewhat centralized. We do actually need a single user database, because if you had multiple user databases, they could have contradictory ideas of, for example, which server is a particular user hosted on. That would be a problem because in the end there does need to be an authoritative source of which server to look at if you're looking for a particular user. That user needs to be able to change that record. Of course, they need to be able to point their repository at a new server if they switch to a different provider. This is at the moment a centralized database. We call it the PLC directory.

PLC stands for Public Ledger of Credentials. There are a few mitigating aspects that make this less bad as a centralized component. One is that there are cryptographic integrity protections in place there. This is a database in which every entry is signed by the PDS of the user who controls that particular entry. Even if this user database were to go bad, it wouldn't be able to return fake data on behalf of users, for example. If the user's database were to pretend that a user has moved to a different server when in fact they haven't, then that would become immediately apparent and the client querying the database would reject that response. The worst thing that a misbehaving user's database could do is just to either not return any responses at all to queries or to return outdated responses. The next mitigating factor is that actually this user's database is fairly small and it's also available for anyone to clone.

Anyone can download a copy of it and host a full replica of it. That also provides a credible exit route because it means that if this PLC directory were to go bad and start censoring people, for example, or misbehave in other ways, then the community could rally around a new provider of this directory, switch over, and this then means that there's a backup plan in case this directory were to go bad. Another mitigating aspect is that Bluesky have actually moved the maintenance operations of this PLC directory into an independent Swiss association, a Swiss non-profit, that is at arm's length of Bluesky the company. This also is intended as essentially a protection against if the company were to be taken over by management who doesn't share our current values, then at least the PLC directory is separate and in a place that is hopefully organizationally independent. It would be possible to use a blockchain to make this user's database essentially distributed across multiple providers that as a consortium could maintain this database. For various reasons, this is not currently our preferred option, which I won't go into now, but it is a possibility that I just wanted to mention.

In practice, how decentralized is Bluesky and atproto really? To tell the truth, the vast majority of the users are hosted on Bluesky, the company's servers at the moment. In that sense, you would argue it's maybe not very decentralized, but there are a few things in its favor still. The list of alternative providers for all the different parts of the infrastructure is growing. There are now competing providers for all of the different components of atproto. The list of non-microblogging apps on top of atproto is rapidly growing as well. There's really an ecosystem forming around it. Also, Bluesky is taking the core protocols of atproto to the IETF for formal standardization. This means then actually Bluesky the company is giving up control over these protocols and saying they're becoming this community-maintained artifact that really then has the potential to become core internet infrastructure without having this corporate control over these protocols and data formats.

I think that's a really important part of really making this a credible open standard as well. Then, finally, this idea of credible exit means that even if most of the users are currently actually hosted on servers belonging to a single company, at least the option that users have to migrate to other providers is a way of keeping the company honest and keeping it from enshittifying their services, to use Cory Doctorow's term. Because the company knows that if it was to do things that users disagree with, they will just take their data and go somewhere else. This actually adds a profound safeguard against companies abusing their potentially dominant market position. That's all I wanted to say on atproto.

Collaboration Software - Local-First Software

As the last part of this talk, I wanted to talk a little bit about local-first software. This is an effort that I have been involved with for quite a few years now. The logo on my T-shirt is the logo of local-first software. In order to explain what local-first software actually is, I thought I would maybe start with a comparison of two pieces of software that you think of as very different, and that is Google Sheets, the web-based spreadsheet tool, and Git, the version control system that you probably all use. These are obviously very different pieces of software, but I think we can meaningfully compare them along various dimensions. Let's start with one strong point of Google Sheets. It's a spreadsheet. You can do all the spreadsheet-y things with it. You can put formulas in it. You can put rich text with formatting in it. You can put images in it. You can plot graphs, and so on.

Git is very limited in its data model. You can put plain text files in it, basically. You can obviously put binary files in Git as well. You can take an Excel spreadsheet file and check it into a Git repository. You just don't get any diffing support and you don't get any merging across branches. If two people on different branches make commits that both modify the same spreadsheet file, then good luck merging those together yourself again. Certainly the git merge tools don't give you any support with that. The merging is only really supported for plain text. That means if you're working in a domain where plain text is the main medium, such as software engineering, then actually Git is just fine, but actually a lot of the rest of the world works in media that are not plain text files.

Another difference between Google Sheets and Git is that Google Sheets has real-time collaboration. That means you can see your collaborators' cursors, you can see as they're typing live, so you get very fine-grained collaboration. With Git, by design, you get much more coarser-grained collaboration. Whether this is good or bad, of course, depends somewhat on your perspective of what you're trying to achieve. With Git, the unit of collaboration is a commit. If you want your collaborator to see your changes, you have to make a commit, push it, your collaborator has to pull again, and only then do they see that you've made some changes. For fine-grained collaboration, having this real-time collaboration mode there as an option is very valuable, and Git by itself doesn't really offer that. Another difference is that a spreadsheet can be used by pretty much anyone. Anyone can at least enter some numbers into a spreadsheet and read it, whereas Git is famously arcane and difficult to use.

If you're a software engineer, then it's reasonable to expect you to learn to use Git, but for most of the rest of the world who are not software engineers, I don't think it's reasonable to expect them to learn to use Git. So far, I've just been talking bad about Git, so I should mention that there are lots of really amazing, great things about Git as well. In particular, Git is a version control system, to make it stating the obvious, but it means we have branching, we have diffs across branches. You can see what changes your colleague made last week while you were on holiday. You can review the changes that other people have made. We can have pull requests, which are this ceremony that we do around allowing us to review changes and signing off on them. You can merge the pull requests. You can then even selectively revert some things that were made in the past without reverting other things.

That is all amazing capability that we take for granted often as software engineers, but most of the rest of the world does not have these version control capabilities. In Google Sheets, you get a linear version history. That's it. I think the most advanced form of revert is roll back to an old version, but that rolls back all of the changes since that old version. There's no capability of branching, no pull requests or any other form of allowing somebody to propose some changes and having somebody else review them. Even though people do really sophisticated things in spreadsheets, people write financial models that run the entire world economy in spreadsheets, and yet people using spreadsheets don't have these version control capabilities, which is mind-blowing if you think about it. They should really have the same capabilities that we software engineers have.

Another thing with Git is you can push the same repo to multiple hosting providers without thinking twice. You can have a repo that's a full clone on your local machine. You can push it to GitHub. You can push it to GitLab. You can push it to Bitbucket. You can self-host a Git hosting service if you like, and that all just works because they all speak the same Git wire protocol, and so you get this ability to move from one provider to another very easily. Obviously, this only applies to the Git repositories. If you're using GitHub Issues or any of the other tools, that isn't in a Git repository, and that's therefore much harder to migrate to another provider. For the actual Git repositories, it's amazing in terms of how easily you can move it between providers.

With Google Sheets, you don't have any of that. You're totally bound to Google servers. There's no way of changing Google Sheets to point it at a different server. It's absurd to even think about. Finally, Git is just files on your own machine. This is extremely empowering. This is really good for user agency because it means that if you want to write a program yourself that analyzes the data in your Git repository in some way, that's fine. You can just do it. It's easy. With Google Sheets, you're limited to whatever Google's API provides, and you don't have direct access to the actual files, data files. That hopefully is all so far quite clear.

What is local-first software? Local-first software is basically, what if we take the positive sides of Google Sheets and the positive sides of Git and drop all of the negatives and bring all of this together? What if we had all of this in a single system? What if we had software that allowed rich data formats that are not just plain text, that allowed real-time collaboration that's accessible to anyone in terms of usability, but also that supports all of those version control things that we love from Git, that supports having the same data on multiple different providers, and that allows you to have the files on your own machine where you can do anything you like with them? That's what local-first software is. You can think of this as a perspective change where with traditional cloud software, Software as a Service, essentially the primary data of your users lives in a database in the cloud, and then the users access this through a web browser.

It's essentially a thin client architecture. The web browser essentially just takes some data from a server and renders it, but it doesn't actually do very much locally. Then any interfacing with the server is typically done over a REST API or some kind of RPC. If the client wants to store some data, then the change that the user made locally on the client has to be persisted in the server-side database because the server-side database is the canonical source of truth here. If you don't write something to the server database, then it didn't happen. The idea with local-first is that we essentially invert this relationship and say that we make the copy of the data on the user's own machine the primary copy, and any copy of the data that lives in the cloud, that's just essentially a copy that exists for backup purposes and for the purposes of syncing data from one client to another, very much like with Git repository hosting.

Then, if you have this architecture here, we've moved all of the business logic to the clients. The cloud is not doing anything very complex in this case, and so then that makes it much easier to commoditize that cloud service and have compatible data syncing services on multiple different providers. It even potentially opens the possibility of doing peer-to-peer sync directly between clients without involving any cloud services at all. This is subject to limitations, like you have to have the clients online at the same time, so cloud services are still useful in this local-first world. By minimizing the role of the cloud services, we've opened up a lot of freedom and flexibility.

I don't want to pretend that local-first is great for everything, but it works really well for file editing type software, so text editors, word processors, spreadsheets, CAD software, making presentation slides, any of those kinds of software. Moreover, local-first is good for productivity software, loosely defined, things like note-taking, issue trackers, calendars, timekeeping, and so on. Basically, those are types of software where a user can edit the data in whatever way they like. Local-first is not a very good fit for apps that are managing some sort of physical resource, like money or inventory in a warehouse, because in those cases, it doesn't make sense for the user to arbitrarily edit the data. It doesn't make sense for the user to arbitrarily edit their bank account balance because, in the end, there's an authoritative copy of that data and it lives in the bank. For that, a centralized model still works just fine. For this class where local-first software works, I think it works really well.

Brief history of where local-first came from. In 2011, Mark Shapiro and some colleagues published some academic research on a type of algorithms called conflict-free replicated data types, which I won't go into in this talk, but they are essentially algorithms that allow merging and conflict resolution if multiple users have modified concurrently their local copies of the data, and now they need to merge them back together again. They offer a principled way of doing this. Then I started working on CRDTs about a decade ago. In 2017, my collaborators at Ink & Switch and I started an open-source project called Automerge, in which we implemented a bunch of these ideas. For the first couple of years, Automerge was really just a research prototype which we were using for our own exploration. By 2019, we decided that we needed a term to describe this type of software that we were building. We coined this term local-first in an essay.

You can find it online here as well, www.inkandswitch.com/essay/local-first/, which we tried to explain this idea and give it a name so that we could talk about it better. Then for a couple of years, actually not that much happened. A couple of enthusiasts found our essay and said, this seems like a great idea. Then, other than that, not that much seemed to happen. We kept working on it. By 2022, we declared Automerge to be production-ready, and people were using it in production. Workshops and meetups started popping up around this topic of local-first. By 2024, then things started really taking off. In '24, we got a local-first conference, a conference dedicated to the topic of local-first in Berlin, which has been running every year since. It's running again in July of this year. We got a podcast. We got a weekly newsletter. We got meetups, all dedicated to this topic of local-first.

Now there are dozens of startups building commercial products around local-first software, a bunch of open-source projects building open-source platforms, and so on. A short documentary was released. Here's just a short selection of the tools that exist now in this local-first landscape. This is the local-first conference that is coming up soon, and a short documentary of it was also released just a couple of weeks ago.

Putting it All Together

Bringing these ideas together, we have multi-cloud, atproto, and local-first as the main things that I've talked about today. What these all have in common is this idea of commoditization, allowing interoperability between different providers and decentralization. That enables easy switching between providers. What does easy switching between providers allow us to do? Part of it is mitigating the geopolitical risks that I started this talk with. Also, it's about reducing lock-in to any particular vendor. Also, I think it's important as a way of increasing user agency over their own data. We've come at this in the design of atproto and in the design of local-first software, really starting from this idealistic research project point of view without really necessarily knowing whether this was going to go anywhere. We just started with our values and principles where we wanted this to go, worked on it over years, and now increasingly these things are becoming mainstream industry trends, and that I find incredibly encouraging and actually an optimistic message with which I want to leave you.

That it is possible through technological work to actually change the balance and change the outcomes that we can expect in terms of a lot of these things, they're ultimately about power relationships. That's the core message of this geopolitical idea as well, is dependency in technological terms creates power of one country over another. Using technology to give back power to the users is something that then enables greater freedom and it changes the balance of power. That's maybe something for you to think about in your own work as well, of how does the technology that you work on shift the balance of power between different parties. What we've always tried to do with local-first and atproto is to empower the end users, ultimately.

 

See more presentations with transcripts

 

Recorded at:

Jun 08, 2026

BT