BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Increasing Users' Data Agency: From BlueSky's AT Protocol to the Local-First Software Movement

Increasing Users' Data Agency: From BlueSky's AT Protocol to the Local-First Software Movement

Martin Kleppmann, an associate professor at Cambridge and author of Designing Data-Intensive Applications, discusses the evolution of data systems over the last decade, mainly the shift from monolithic databases to modular building blocks. Kleppmann underlines the importance of moving from cloud-centric data storage systems to decentralised data storage similar to Bluesky’s AT protocol. He also dives into explaining the local-first movement and the importance of users owning their data.

Key Takeaways

  • In the last decade, systems have shifted to cloud-native architectures, in which data storage is fundamentally decoupled from compute and relies on inherently replicated object stores rather than on local disk replication.
  • Shifting from monolithic data systems to modular data stacks will improve the maintainability and performance of your data systems.
  • Evaluate decentralisation trade-offs by balancing pure federated models against the need for global indexing and consistency to ensure a seamless user experience, as seen in the design of the AT Protocol.
  • Prioritise user agency by implementing local-first principles, ensuring the primary copy of data resides on the client device to enable offline access and mitigate the risks of vendor lock-in or service shutdowns.
  • Libraries like automerge provide the backbone for building local-first applications, enabling Git-like version control and real-time collaboration, even for non-textual data formats such as spreadsheets and CAD files.

Transcript

Olimpiu Pop: Hello everybody, I am Olimpiu Pop, an InfoQ editor, and I have in front of me Martin Kleppmann, who is probably one of the most well-known people in the data space because he wrote Designing Data-Intensive Applications, and he has a second edition that was launched not long ago. But without any further ado, herzlich willkommen, Martin. Can you please introduce yourself?

Martin Kleppmann: Yes, sure, it's great to be here. So, I am an associate professor at Cambridge, where I teach and conduct research on distributed systems and certain security protocols. I have now also written two editions of the Designing Data-Intensive Applications book for O'Reilly. I used to do startup software engineering back in a previous life before I switched over to academia.

Olimpiu Pop: Nice, thank you for writing it. I read the first edition, and I'm looking forward to starting the second. But for those of us who are not that patient about reading everything, what will change in between? If I remember correctly, it's been almost a decade since you first wrote the first edition.

The Shift to Cloud-Native Architecture and Object Storage [01:46]

Martin Kleppmann: Yes, the first edition came out in 2017, so that's nine years old now. So, of course, technology does move on in that sort of time. So, part of the goal of the second edition was to bring it up to date with technological shifts. Partly, it was also some things I've now learned to understand better myself, so we rewrote some of the explanations as well to hopefully be clearer now.

But in terms of the technology that has changed, I would say one of the biggest things is the rise of cloud-native software architectures, which is a bit fuzzy—the term cloud-native is not exactly well-defined—but what we mean with this is systems that are built to take advantage of cloud services as opposed to local operating system services, which was the case previously.

So historically, if you wanted to build a distributed database, you would write a piece of software that would store some data on the local disk of each node, and if you wanted to replicate it to another node, you would do that at the database software level, and then the replica would store its data on its local disk on that node. And with a bunch of modern systems, that's just not true anymore, because now people are building databases on top of object stores, for example, and so the underlying storage abstraction is an object store, not a local disk, and it's already inherently replicated at the object store level, which then changes the way you build things on top of it.

And so, of course, we still have traditional databases, which you run on local disks, but more and more systems now also take advantage of cloud services as the underlying abstractions. And that's really an important shift that we wanted to weave in throughout the book, so we've kind of incorporated that idea throughout various chapters.

Fragmenting Monolithic Data Systems into Modular Building Blocks [03:40]

Olimpiu Pop: Great. One thing that I saw in the data space—for me, data was, you know, the cylindrical block in the diagrams. I didn't touch it; I left it for the guys who really know how to handle that. But in the last couple of weeks, maybe months, I spoke a lot with people who are actually designing and building data systems, and then a new passion, a new interest in this arose in me.

So, one of the things that—the conclusions to the conversation and also from what I was building—is the fact that after a long period of time where you had only monolithic blocks also in the data space... You had everything in one piece, and you either built something or you bossed something; everything seemed to be like huge. Data lakes and so on, so forth in the BI space.

But now it feels like the revolution has also touched the data space, and you have more fragmented pieces you can build stuff with. For instance, I had a conversation with the guys who built InfluxDB, and they were just talking about the FDAP stack, as they call it: Apache Flight, DataFusion, Apache Parquet, and Apache Arrow as building blocks, and then it's a lot easier now. And, as you mentioned, object storage. Some of the things that appeared only as technologies—like S3 buckets back in the day—have now become de facto standards. S3 is one of these standards; then you have Apache Parquet, which became the underlying layer between data storage and data lakes and analytics. It feels like it's a more liveable environment, and it's continuously changing. I'm very excited about what follows.

Martin Kleppmann: Yes, and I think that's quite an interesting trend there, that we're seeing this fragmentation of these big monolithic systems into composing different building blocks together, where you might take an object store from one vendor and then a columnar data file encoding format, and then have a table format on top of that, and then a choice of different query engines that you could use to run queries over this. And I find that very exciting because, if you can mix and match different components to best suit the needs of your particular application, I think that increases flexibility for everyone. And it kind of opens it up for more people to experiment, because you don't have to build a huge system anymore, you can just build one component of a system and reuse other parts. So, it's more open for people to customise and experiment with new approaches.

Olimpiu Pop: Another trend that probably was around a little bit earlier, before you wrote the book, was the SQL versus NoSQL movement. And at that point, it felt like that was the whole revolution. And now, probably to show some calm live discussion with the guys that built QuestDB... What these guys were saying, and it was like, okay, it's a multi-tiered one. And what was interesting for me is like, okay, we have a write-ahead log where you can just write, like in a NoSQL style, you just write, you don't care about anything else. Then you have a SQL-based query engine, and you can query data you normally wouldn't have thought was queryable via SQL.

And then you have the cold storage that's Apache Parquet files, that's still queryable, but it's written on the file system, and as you mentioned as well, it's easy to be archived, it's easy to be backed up because in the end, it's a simple file system storage mechanism. And I think there are a lot of options, maybe too many options if you are in the position of choosing one for your system.

Martin Kleppmann: Yes, definitely.

Decentralisation and Data Sovereignty: The Role of Protocols [07:16]

Olimpiu Pop: The other thing that was quite interesting during your keynote, and by the way, kudos for the title. For the listeners that don't know about it, Martin used, I would say, a taboo word normally: "geopolitics" in his title for his keynote, and everybody was very interested in it. One of the points that you mentioned was that having more distributed systems will allow us to be more autonomous, more sovereign—because this is the trend word these days—as technology, and that will allow us to be further away from being locked in by vendors, regardless of the nationality or the country where this service is provided. And one of the protocols you helped build was the AT Protocol, which underlies the BlueSky social network. Maybe you can give us, in a couple of sentences, what your experience was building it and what lessons you learned from your experience designing data-intensive applications that you brought to the table while designing the protocol?

Martin Kleppmann: Yes, the AT Protocol is super interesting. I was contacted by Jay Graber quite early in the process, just as she was getting the team together to build BlueSky. This was at a time when we still believed this effort was to build a decentralised technology for Twitter at the time. This was before Elon Musk came in and acquired it. And so, the idea is that we would just be building a technology infrastructure layer for different social networking apps to run on.

And then Elon came along, plans changed, and Twitter got renamed to X. And, well, we decided that BlueSky needed to be just a social network in its own right, competing directly with the incumbents. But we still managed to keep that ethos, those principles, and the values on which the AT Protocol was originally founded and designed, which essentially meant that social media is too important to leave in the hands of a single company.

And there are different approaches to decentralising social media, so I guess one alternative is ActivityPub, which Mastodon is based on. And that has been pretty popular as well, and it works very well for many people. And that already existed when we started AT Protocol, but we deliberately designed AT Protocol to be different in various ways to address what we saw as issues with how ActivityPub worked.

But there were also trade-offs, and so, in particular, what we wanted for AT Protocol was a technology that would provide a user experience essentially indistinguishable from a centralised social network. So, we wanted a decentralised system at the technical level, without causing any weirdness in the user experience or any complicated steps or hoops users would have to jump through.

And you don't have this on Mastodon. So, for example, on Mastodon, if you look at the replies to a particular post in one server's reply thread, you may see an entirely different thread than in a different server's reply thread—the replies to a different post. And that's because not every server knows about all of the replies to a post. I think for ActivityPub, that was just an intentional, deliberate decision to maximise the decentralised, federated nature of the system, but for AT Protocol, we wanted consistency. We wanted it so that if somebody looks at a reply thread, everyone should see the same thing, basically. If you look at likes, everyone should see the same number of likes, and so on.

And that then implies that there has to be something that essentially indexes the entire network, that looks at any replies that are posted on any of the servers and brings them all together so that they can be shown in a single thread, and that counts all of the likes that happen on all of the servers and adds them up to a single number.

And that is done in the AT Protocol by essentially providing a firehose that combines all these activities from all the servers. There are still lots of different servers that store user data—they're called Personal Data Servers, or PDSs—but that data is aggregated into a big firehose via a service called the Relay, and then different organisations can build services that index this data set. And anyone can come along, take this data set, and compute whatever views they want on it.

And to be honest, most of the people using BlueSky now are using the services provided by BlueSky the company, and as a result, it's somewhat more centralised than a purely federated system like Mastodon. But we decided that it was a reasonable trade-off, as long as there was an option for users to switch to a different service provider if their current provider ever stopped living up to their expectations.

And so, the core founding principle behind AT Protocol was that users should be able to switch to a different provider without changing anything or losing any of their data. So, they shouldn't lose their username or have to change their username by switching providers, they shouldn't lose any of their posts that they've created or any of the replies from other people on their posts, they shouldn't lose any of their social graph—neither the people they are following nor those who are following them. All of that should be easily portable between providers, and we have managed that with AT Protocol. There are now also alternative providers that run their own systems but give you essentially the same data. You can see the same posts and reply threads, but through a different provider. That seems to have worked out pretty well, I think, as a design principle, meaning you get a good user experience for decentralised social media.

Olimpiu Pop: Personally, I really like it because it's quite clean, and it seems to have all the attributes that Twitter used to have without the insanity. But to summarise, all the user-related data is stored in PDSs, which is Personal…

Martin Kleppmann: Personal Data Servers.

Olimpiu Pop: And then you have a backbone where everything—a firehose in the middle—where all the data is aggregated, so that's publicly available, and you can just play with it, and everybody can use it. But what BlueSky is doing is just one implementation of the AT Protocol, but actually, whoever wants to do it can do it. And I think you even gave an example of another firehose during your presentation that appeared in the last period.

Martin Kleppmann: Yes, there are several alternative ones, like BlackSky runs an alternative firehose, for example.

Olimpiu Pop: If you redo a similar protocol, what would be the things that you would do differently, if there are any?

Martin Kleppmann: I'm pretty happy with how the design of the AT Protocol has ended up. I mean, I played only a small part in it, right? So, I didn't write any of the code in the implementation; that was all done by the BlueSky team. I was just an advisor for maybe an hour a week. But I would like to think that some of my thinking about building scalable data systems has fed into the design, and, on the whole, the design decisions have worked out pretty well.

Whether the system is decentralised or not depends, of course, not just on the protocols but also on which organisations are actually running services. So, you could have a perfectly decentralised protocol, but if no alternative providers are available, then it's all hypothetical. So, I'm very happy to see that more and more alternative providers to BlueSky are now offering AT Protocol services. But that's something that BlueSky itself cannot do, kind of by definition. It has to come from the community and other people, so there's actually a limit to what BlueSky can do to increase that decentralisation. Like, I think BlueSky has really done everything right by opening up the data so it's available for anyone to start competing services, but in the end, it's up to other people to step up. And it does require some funding, of course, because engineering and infrastructure are necessary to run these services. But I think we've done what we can to enable it, for example, through open-source software, good documentation, and so on.

The Local-First Movement: Empowering Users and Ensuring Data Agency [15:51]

Olimpiu Pop: Local-first is something that is closer to you. I know that you also wrote some software that you're very proud of. Can you tell us more about what the local-first movement actually is, and what its mission is? Why did you get it started?

Martin Kleppmann: Local-first is also, you could argue it's about decentralisation, but it's for a different type of app. This is thinking more about collaboration software, where you've got a small group of trusted collaborators, for example, in Google Docs, writing a document or working together on a spreadsheet or working together on some graphics files or a bug tracker or things like that, where it's really about collaboration between a small group of people.

And what local-first is, is a principle for designing this kind of collaboration software so that the primary copy of the data is not somewhere in the cloud, but on your own machine. You have a copy of the data locally on your own machine, which means that you can access it while offline, for example, and you can just keep using the software without an internet connection, and it'll just resync the next time you come back online again. It also means the software is simply faster to use because it doesn't have to wait for a network round-trip every time you click a button, and everything can work entirely from your local storage.

I think this is important because it empowers users. It's really about user agency. And also, it's about reducing the risk of cloud providers suddenly disappearing. You know, I think all of us have probably had this experience: using some web-based software that was pretty useful and pretty nice, and then the provider just decided to shut it down. Google Reader is one example that people still cite even a decade after it was shut down because they loved it so much, and Google shut it down anyway.

That's just an inherent problem with software-as-a-service or web-based software. There's no way you can continue running a copy of that software locally on your own machine if the provider decides to take it away from you; it's gone. If the provider decides to lock you out of your account, well, then you lose all of the files that you ever created with that software. And I find that a really untenable situation where we're just completely dependent on some software providers not locking us out of all of our data.

And that's why I think local-first is so important: it means that if you have a copy of the data locally on your own machine, nobody can take it away from you. Even if all the servers go away and you can no longer collaborate with other users, at least you've still got a copy of the data yourself. I'm really happy that local-first is now beginning to really catch on as a wider industry trend. We're seeing dozens of companies now making local-first products and marketing them as local-first as well, and this is an attribute of software that people are increasingly caring about as desirable because it just gives us better software.

Olimpiu Pop: For me, it was the guideline or something that I pushed always when architecting or designing a new application. For me, it made no sense when you have quite powerful devices in your pocket why something like, I don't know, for example, Instagram is not working even though you cached a lot of data on it, and then, I don't know, you're in flight mode and you don't have access to it and you have a banner and it says, "Okay, you have to connect". That was one thing, and there are probably others where I knew the data was cached locally, so I didn't understand why you couldn't access it cleanly and even take it away. So, more or less how you defined it, it feels like it's maybe a little bit more evolved way of Git, because Git, that's in the end the perspective that Git has. You have a copy on your local machine, and obviously each node can be the master, and so on. Well, we wanted decentralisation, but then GitHub, GitLab and all the others appeared, and then we went back to a centralised approach.

Martin Kleppmann: Yes, but Git is actually a fantastic example here, I find, because, you know, it is local-first in the sense that all of the data's local, including all of the commit history is local. You can just inspect past versions of your repository without talking to any server. And what's more is that, you know, you mentioned GitHub and GitLab as these points of centralization, which is true, but at least we still have the Git protocol as an open standard sync protocol, which means you can take the same Git repository, push it to GitHub, push the same repository to GitLab, push the same repository to your self-hosted Git service, and it just works. You can just have multiple remotes, and it works just fine.

This is something we don't have with Google Docs. With Google Docs, you can't change the Google Docs application to use a different syncing service. But really, we should have that, right? And that's one of the things local-first aims to get at. It's not saying that we cannot have cloud services, because, to be honest, cloud services are really useful for syncing data from one device to another. You know, we've tried to make stuff work as peer-to-peer, and actually, peer-to-peer is just really hard to make work reliably. It's hard to get away from cloud services entirely.

But if you can run a system in such a way that you can just use multiple cloud services side by side, just the way you can use multiple Git hosting services side by side, then you've taken away most of the problems of decentralization, and in fact, then we're back to exactly what we were talking about with AT Protocol, where you now have the ability for different providers to provide competing services, and if they use the same protocol—it's an open standard protocol like what Git uses for pulling and pushing its commits—then you actually remove a lot of the lock-in that you would otherwise have with these providers.

Of course, with GitHub now, people still use the issue tracker and the wiki and these other features, which are not stored in Git repositories and therefore not portable from one provider to another. So for those aspects, we still have the centralisation.

Olimpiu Pop: Yes, that's a whole different story.

Martin Kleppmann: I think it'd be great if those were in Git repositories as well, but that's maybe a longer story. But at least the core Git repository is very portable across providers, and I think that's a model other software should emulate.

Bringing Git-Like Version Control to Non-Text Data and the Wider World [22:17]

Olimpiu Pop: So, just to wrap my head around what you mentioned, local-first, the main target is for more broadly used technology, like you gave the example of Google Docs. So, textual editors or we can also have issue trackers and stuff like that that are more broadly used, because technology Git, particularly, is used by people who are more tech-savvy, and local-first is thought about having the ability to actually move your stuff to another provider. Because we had that period when you bought software with boxes, CDs, whatever, then we had the SAS movement, and then that meant, as you said, acquisitions, services that are killed, and so on so forth, and then you just remain with nothing because most often than not, they just give you an export that nobody supports to import. And now, we are trying to decouple that. It's one thing, I want to be able to work with my data, and if I choose to have it in the cloud or backups, this should be something democratised and available for a broad number of people, not only for technology enthusiasts.

Martin Kleppmann: I think that's absolutely right. So, we software engineers use Git, but It has a brutal learning curve, and, moreover, you can't really use it for anything other than text files. Yes, you can put a spreadsheet, an Excel file in Git, but you won't get any meaningful diffing or merging of that; it'll just treat it as an opaque binary blob. So, part of our goal with local-first here is to take the power of systems like Git and make it accessible to people for whom using Git as a command-line tool is just not appropriate.

So, we want, for example, spreadsheets. We want them to have the same kind of real-time collaboration we've got used to with cloud software, but at the same time, we also want local data storage and the ability to sync via the sync providers we know from Git, for example. Also, we want the version control features from Git. This is another thing that, you know, we software engineers take for granted: yes, obviously we can review the changes our colleague made last week while I was on holiday, or do code reviews on pull requests. This is just part of the natural workflow of software engineering. Many other businesses, many other professions don't have this kind of version control tooling. You know, people build very sophisticated financial models in spreadsheets, and they don't have version control for spreadsheets. It's a shocking situation, really.

But part of the nice thing with local-first is that, in order to sync the edits that you make from one user's machine to another user's machine, we have to capture what those edits are, anyway, and encode them in a way that we can store on disk and send over the network. And so, if we're capturing the edits, anyway, it's actually not that big a leap to also make this into a version control system that can keep the version history of the editing over time and let people see what changes their colleague made last week, and to diff one version of a file against another version. It's actually exactly the same mechanism that's needed. That's another reason why I'm excited about local-first. It's empowering people, not just by ensuring they don't lose all their data if their cloud provider goes away, but also by bringing this power of version control to many other apps that currently don't have it.

Olimpiu Pop: One reason why I am very excited about it is that I have an ongoing saga with my colleagues—I'm currently working for an IoT company—and for me, it's a bit of a problem, not to call it otherwise, because we have the design of the hardware. It's impossible to version it, not in the way we are used to. We cannot compare, I don't know, iteration one of version three properly and just see what changed, and it's a whole different problem because you have like file storage and then only the person that actually built it knows what's there, and yes, I'm totally behind you.

Martin Kleppmann: Exactly, even CAD software used by engineers for like physical product design or electronic engineering for circuit diagrams, even that doesn't have version control. It's just unbelievable.

Olimpiu Pop: Not in the sense that we know from software, where you just have a nice little diff where you have with red, the component that was taken out, and with green, the component that was put in, because that would be amazing.

Just technically speaking, what is local-first? Is it a set of principles, and then it's up to each individual, each software, to define its way of doing? Because it's not a protocol.

Automerge: The Open-Source Library for Local-First Collaboration [27:09]

Martin Kleppmann: Yes, it's a set of principles and values, I would say. Then there are lots of different software implementations that interpret those principles in various ways. So, my collaborators and I work on a software implementation called Automerge. It's an open-source library that you can use to build collaboration software. So, it provides a data model for storing an application's data and can automatically sync it from one machine to another. It can do real-time collaboration, but it can also handle version-control-like use cases, where you want to go off on a branch for a while, then diff the branch against its base, and later decide whether to merge it. It can do all of those types of things, and it can do it for different file types. So, you can use it for text editing or for spreadsheets or for graphics or, you know, presentation software, or CAD software; people have built all sorts of different types of apps on top of Automerge already.

Olimpiu Pop: Okay, given that it's a library, it means that it is implemented in a language. What languages are currently available?

Martin Kleppmann: It's implemented in Rust, which we chose because it offers very good portability across almost all operating systems and platforms. So, if you want to use it in a web browser, we compile Rust to WebAssembly and then wrap it in a TypeScript wrapper, and that's actually how most people use Automerge. But there are also native bindings in Swift for iOS and in Java for Android. There are, I think, Python bindings, there are Go bindings, C bindings, so pretty much all the popular programming languages.

Olimpiu Pop: Well, yes. Given that Rust compiles to WebAssembly, I think that's pretty good because WebAssembly is now incorporated in most of the technology stacks, so that's a nice way to take it on board.

Martin Kleppmann: Yes, Wasm is well supported now. It's still kind of an adventure to incorporate Wasm into the JavaScript ecosystem, like there's plenty of friction there across that boundary. But it works well enough, and it does mean that we can use the same optimised algorithms across all of the language implementations. We have put a lot of effort into optimising the internal data structures inside Automerge.

Olimpiu Pop: If I were to start working on something, the best place to start is to have Automerge incorporated and start playing around with it.

Martin Kleppmann: That's what we suggest, yes.

Olimpiu Pop: What would be a Hello World that you would recommend to somebody who wants to get started with something?

Martin Kleppmann: Well, I guess a common Hello World would be like a simple to-do list app, or so that syncs between devices. A to-do list would be very easy to represent in Automerge.

Olimpiu Pop: Now I understand why you chose JavaScript as an example to show that it's harder to achieve, because it's the go-to example for all JavaScript frameworks. Building a to-do list.

Martin Kleppmann: But here, if you build it with Automerge, it'll sync across different devices, so that's something that JavaScript out of the box doesn't give you.

Olimpiu Pop: Well, if you—if you solve the JavaScript problem, you should get a Nobel Prize for peace, because the JavaScript ecosystem, probably since we started talking, they have like 10, 20 new libraries that appeared, so thank you for that. Okay, do you have a registry of all, let's say, local-first software that was built with Automerge? Is it like an awesome list where we can just go and see this kind of local-first…

Martin Kleppmann: That would be a really good thing to have. To be honest, we don't always hear about it because, you know, it's an open-source project, of course, anyone can just use it without letting us know. Occasionally, we get bug reports from people, and then that way we find out that they're actually using Automerge in production. It's always nice to hear the stories. But yes, we should actually try and put together a list like that. We do have a couple of companies voluntarily sponsoring the development of Automerge, and so we're very grateful to them for financially supporting the project. But it's not required, it's just an optional sponsorship.

Olimpiu Pop: What would the limitations be? Where do you think using Automerge or just trying to implement it by yourself, it's currently not feasible to think about this kind of application? Is there somewhere that you say, "Okay, it's not good enough at this point, it has limits"?

Limitations and When Local-First Does Not Apply [31:24]

Martin Kleppmann: I would use local-first in general for apps where the user edits the data themselves in whatever way they want, basically. And so, that means it works well for apps, you know, that are used for creative purposes, but also as I, you know, keep mentioning spreadsheets as an example, or graphics. Those are all things where the user can edit the data however they like.

It makes less sense for data where, say, it's a bank account, where yes, in principle, you can edit the balance of your bank account and add an extra zero, but that doesn't make you actually have more money in your bank account. In the end, the bank has the authoritative copy of your account balance, and if you edit it locally on your own machine, that doesn't really mean anything. So, for that type of software, local-first doesn't really make sense because there's an authoritative copy on a server somewhere, and it has to be that way.

Also, if it's like an online shop, for example. In principle, you could make an online shop local-first, but that would mean that you download the entire product catalogue to your local device, which means you could browse offline, which maybe is convenient, but also the catalogue will contain a million items of which you're only interested in three, and so downloading everything upfront doesn't really make sense there either. And also, you know, customers of the online shop are not going to edit the product descriptions in the product catalogue, that doesn't make sense either. So, that's why I say that local-first makes most sense for these kinds of creation apps where the user is creating the data themselves.

Olimpiu Pop: It's like having a local-first copy of Amazon is just like having a leaflet; it doesn't guarantee that the product you want to buy is still in the shop.

Martin Kleppmann: So, with something like the stock in a warehouse, that's just inherently centralised because there's a physical resource, and the computer just reflects what is actually on the shelf of the warehouse.

Olimpiu Pop: We discussed getting started with something. Is local-first something that you can retrofit? We have an application that is currently centralised, and we would like to adhere to the set of best practices. Is it something appropriate? Can we do something like that for the new generation?

Martin Kleppmann: I think it depends a lot on how the app is built. One of the nice things about local-first is that essentially all of the application's business logic is client-side. And you can still build it as a web app, that's fine, or you can build it as desktop native software if you like; it doesn't really matter. If your software is currently built in a way that it already maintains its internal state client-side, you know, as JavaScript state variables in a web app, for example, then you can make it local-first without too much trouble usually, because essentially you would be taking that internal client-side data model that the application already has and putting a sync engine on it so that that data can sync to your collaborators.

And we have done that kind of retrofit previously, so take an existing single-user React app, for example, and just replace its internal data model with Automerge, and then you get a multi-user app. In some cases, that can be pretty straightforward. But if your current app architecture relies on a server handling a lot of business logic, then that's harder to retrofit to a local-first approach because you'd be shifting that logic from server-side to client-side.

Olimpiu Pop: So, the short answer is, obviously, it depends. If you were inspired when you started with the application in the first place, and you keep the parts of the model that you actually need on the client-side, then it's easier. Otherwise, it will probably be a bigger redesign, and it will probably be a lot easier to just get started from scratch and just move all the features that you need.

Martin Kleppmann: Yes, probably.

Joining the Movement: Building Apps and Contributing to Automerge [35:15]

Olimpiu Pop: How can people contribute?

Martin Kleppmann: I think just trying to build some apps with Automerge is probably a good place to get started. There are a bunch of other libraries also that aim to make it possible to play with local-first software, so it doesn't have to be Automerge; that's just the one that I happen to work on and like. But if you look at localfirst.fm, there's a nice comparison table of lots of different software libraries there that can all be used to build local-first software.

Yes, so I think building some apps is a good way to get started. If it turns out you're actually more interested in the underlying infrastructure behind the apps, then of course you can go deep on that, and everything is open-source and open to contributions. We always have ongoing work to improve the performance, for example, and new designs of the data synchronisation protocols.

We're in the process of adding end-to-end encryption and a decentralised access control system to the protocol. So, there—there's a lot of ongoing work, and for anyone who's interested in the infrastructural implementation aspects, that's all open to contributors as well. For Automerge, we also have a Discord chat server where all the core developers are available, so it's a good place for communication, finding other people who are interested, and getting questions answered.

Olimpiu Pop: Okay, and talking about community, I know that you and the local-first community are running a conference, I think it's in July in Berlin.

Martin Kleppmann: Yes, it's very exciting. The local-first conference started in 2024, so this is the third year that it's been running. At first, we were not sure, like, was local-first really big enough to justify having a conference dedicated just to that topic? But the first edition sold out, the second did as well, and I hear the tickets for this year's conference are going quickly as well.

And it's been really cool to just see so many different people all sharing the values of what local-first is about, and trying all sorts of software implementations. In some cases, you know, they're competing software implementations that are trying to solve the same problem, but that's okay, and that's healthy, and that's a good part of having an ecosystem, I think.

And what this conference is doing this year is now broadening the scope a bit. So, focusing less on the detailed technical aspects of, say, sync engines, which are used to sync data between clients and servers. There are lots of interesting algorithms and protocols to talk about. But actually, the conference is broadening its scope a bit more to really look at these themes of user agency and user empowerment, which is really the whole reason why we're doing local-first in the first place.

And this then connects very nicely to the AT Protocol discussion from earlier as well, because, similarly, AT Protocol is founded on the values of empowering users by reducing lock-in and allowing users to port their data from one service to another. It's just that AT Protocol does it for the social media domain, and local-first does it for the collaboration software domain, but otherwise, in terms of the underlying principles, they're actually very similar. And so, I'm really happy to see the AT Protocol and the local-first community coming together, and so, for example, Paul Frazee, who's the CTO of BlueSky and one of the core designers of the AT Protocol, will be speaking at the local-first conference.

Olimpiu Pop: So, that will be in July in Berlin for two days, and you can find it online. From what I know, the venue is limited, so it still feels like a tight-knit community.

Martin Kleppmann: Yes, I think about 350 people.

That's a good size. Big enough to be interesting, small enough that it still feels like a real community.

Olimpiu Pop: Yes, sounds like the proper size.

Okay, Martin, thank you for your time.

Martin Kleppmann: It's been a great discussion. Thanks.

Mentioned:

About the Author

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT