InfoQ Homepage Presentations Fast Eventual Consistency: inside Corrosion, the Distributed System Powering Fly.io

Fast Eventual Consistency: inside Corrosion, the Distributed System Powering Fly.io

View Presentation

Speed:

Download

50:13

Summary

Somtochi Onyekwere explains the architecture of Corrosion, a distributed system designed for low-latency state replication. She shares how Fly.io transitioned from Consul to a gossip-based SQLite solution to handle global machine data. By discussing CRDTs, the SWIM protocol, and QUIC, she shares how to build resilient systems that prioritize speed while managing the complexities of CAP theorem.

Bio

Somtochi Onyekwere is InfraOps Engineer at Fly.io. Currently working on distributed systems and networking software. Previously worked on Kubernetes @Weavework.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Somtochi Onyekwere: I'm going to be talking about Corrosion, which is a distributed system that we built at Fly. It's eventually consistent. It's fast because we prioritize speed, basically. It's a choice we've made. My name is Somtochi. I'm a software engineer on the networking team at Fly. It might be a bit weird that I'm talking about distributed systems, but I'm on the networking team, which is not exactly the same thing. I'll explain that later on during the talk. Of course, I'm interested in networking, distributed systems, systems engineering.

What do we do at Fly? We are a developer-focused cloud platform. That means we make it easy for developers to get their apps deployed, up and running. Something I think that really differentiates us is that we make it easy to deploy your apps in different regions over the world. We are available in 40 different regions. It's basically like a CDN, but for your apps. You deploy your apps in different regions that you have the most users.

They basically get to access the one closest to them. It's pretty easy. We make it very easy. You give us a Docker image. We transform it into a Firecracker VM. We run it on our own hardware. It's also very easy to configure stuff like private networking, load balancing, scaling your app, all of that. It basically takes out a lot of the ops in DevOps. How do we do this? You want your users routed to the closest instance of your app that is running. How does this work? This works because we announce the same IP address from different servers. We have two types of servers. We call them edges, which are the ones that interface with the world. We announce your IP address from those ones. There's this thing called Anycast.

If you announce the same IP through BGP and the magic of the internet, your user will get routed to the one closest to you. It uses latency as a factor for selecting the best route. Once it gets to an edge, the edge would route your request then to the closest VM. This means that the networking layer, we have a software called the Fly Proxy, needs to know data. It needs to know, where is your app running? Is it healthy enough to handle the request? What ports are open? Is someone making a request to the wrong port? Little details like that. How it gets this data is also important, because it's factored into its routing decisions. If it's getting data late, it can't make the best routing decisions. It's going to be suboptimal. A pictographic overview of what I've just said, we have edges that announce the same IP address. Your app is actually running on a worker. The users get routed to the closest edge, and then the edge routes you to the closest instance.

Initial Tooling - Consul, and attache

We need to be able to read and write data pretty quickly. If someone creates a machine somewhere else, but a user is making a request from a different region, it still needs to know that data to be able to make its routing decisions. At first, we used Consul. We put data in this key-value store. Consul is this distributed database service discovery platform that is created by HashiCorp. It uses Raft as a consensus protocol to agree on the log of changes that's been made in a cluster. You run a cluster of five or seven nodes, and they agree and communicate. This worked for a bit, but there were some issues with this. There were some issues with this because the cluster ran in a region IAD, which is in the U.S. Every other region, Tokyo, Japan, needed to reach out to it and pull that data. It wasn't very scalable. There's a tool called consul-templaterb, which is written in Ruby, that can make updates to a file when the data in Consul changes. We basically had it updating a JSON file, and then we had the proxy read from that file and load it into its state.

As things were happening much quicker, as we got bigger, we had more users, so it was reading that file a lot. It's JSON, it has to deserialize data into its own internal state. It got unwieldy. Then we had a tool we call attache. attache ran on every node and tried to eliminate the problem of reading data from the JSON file. Instead, a separate process called attache would pull data from Consul. Consul, you'd have an agent running on the node, but there's also a central API, which is where you get updates of everything. It needed to pull the central one, which still had the problem of, you needed to reach out to the cluster in IAD, requesting it to go there.

Then it would pull it, and it was running on every node, basically. It would insert that into an SQLite database, and then the proxy can then read from that. It still had these problems, lots of latency and lots of data being pulled, because each attache was trying to pull the same data from Consul. We have 800 nodes, so imagine 800 processes trying to pull the same data, writing to SQL, each of those reading.

Corrosion

We had to step back and look at the problem. In doing everything to solve it, we created our own platform that we call Corrosion. Corrosion is a distributed system that replicates SQLite data to different nodes. Each node has, of course, an SQLite database. Then, when changes are made to that node locally within the cluster, it would disseminate that information to other nodes within the same cluster. Because it's eventually consistent, and of course, if you're going to replicate data, you have to handle conflict. We use CR-SQLite for conflict resolution using CRDTs. I'll go into this a bit more. Because we need to manage members in the cluster, like nodes will come up, go down, lag behind, we use Foca, which uses a SWIM protocol to manage cluster membership. We also periodically synchronize with each of the other nodes. This talk is basically looking at how Corrosion works and how it achieves low latency replication of data across a large number of nodes.

Just some features of Corrosion. We provide an API, or a PG endpoint too, if you want to. We have a HTTP API. You can make a POST request with the SQL statement that you want to run. It will do that. You'd have to give Corrosion the schema that you'd like to run of the tables. It needs to know about the schema, too. You could make updates on the fly. You don't need to pause Corrosion to do that. It had on-the-fly updates. This is HTTP streaming subscription. Because we had this platform component, the proxy, that needed updates as it was being made, we have subscription-based SQL queries. This means that you could subscribe to an SQL query, and then Corrosion will send you updates as changes are being made to that query.

If the data that the query would return changes, Corrosion will send you that update, so you don't have components constantly reading the database. The propagation of states is talking about replicating data across the cluster. We use QUIC Transport Protocol. Not UDP, not TCP, uses QUIC. We make changes using SQL statements through the HTTP or PG endpoint.

CRDTs: Conflict-free Replicated Data Types

Just a little about CRDTs, because it's central to how Corrosion works. CRDTs are a data type in distributed systems where you can make updates independently to replicas, and sync them. When they sync, the data type itself can resolve conflicts. The order of changes don't matter. This means that the replicas don't really have to agree on the sequence of updates. They can just have changes made to them independently, concurrently. Whenever they exchange their states, it could be the entire state, it could just be the operations that have been run on each of those replicas. They would eventually converge to the same state. An example I like to use is counters. Say you had counters running on different nodes everywhere.

If you decrement the counter, like this is an operation, it becomes two somewhere else. This other one has two increment operations, it becomes five. Whatever order they exchange that log of operations, they will still get the same state at the end. They don't need to agree, ok, we're going to do a decrement first and then increment later. Even if you get the increments, because there are two increments, because 3 plus 2 is 5. Even if it gets one increment first, however the operations are interleaved, doesn't matter, it will converge to the same state. Of course, counters are much simpler data types. You have something like a grow-only set, which is a set that you can't remove data from. A set has no duplicates. Even if two nodes insert the same data into the set, whenever they communicate, it would just be the same thing still within the sets.

Of course, things get complicated as the data type changes. There are different ways that this conflict gets resolved within distributed systems. A very easy example is the last writer wins. You could have a timestamp operation between two nodes, and then compare the timestamps. Whoever has the bigger timestamp wins. This is the last writer wins. Most times in these kinds of distributed systems, you need a way to resolve conflict. It can depend on the nodes communicating with each other at the point when you're trying to resolve the conflict, or the order of changes. The changes can come in any order, any form, but you still have to resolve to the same state.

Just examples of CRDTs, like distributed counters, like we've said earlier, a lot of collaborative applications actually use CRDTs behind the scenes. Google Docs is a very famous one. A lot of applications where you have different people modifying something offline, and later they have to sync, tend to use CRDTs behind the scenes. Because you're letting people, you can basically make changes, and then merge later. You're picking the AP in CAP theorem. CAP theorem is this theorem that says that in a distributed system, you have to pick two out of three, consistency, availability, and partition tolerance.

Consistency is like every node has the same data. If you make a request to two different nodes, you won't have some discrepancies. Availability means your software is available: people can make reads, people can make writes. Partition tolerance means there's a networking problem, and there are some nodes that can't talk to each other, and it's still functional. Basically, what happens when there's a partition, means that the nodes keep on doing the work locally. Whenever that partition heals, it'll just exchange data, and they'll converge to the same state. Then the replicas might have times when they are not consistent, like before it receives states from other nodes. It's a choice that we've made in Corrosion.

CR-SQLite

Corrosion is built on top of this library called CR-SQLite. CR-SQLite is an extension. It allows you to merge CR-SQL databases together. I'm just going to play a small video that I've recorded. This is loading the extension. What CR-SQLite does is that it tracks the number of changes that's been made to a column within a row, in a table in the database. Each table that is in your database has a second table that is tracking changes to it. Each time you make a change, it would increment something called the column version. If you make another change, it increments to two, three, like that. When it's trying to merge data, it doesn't really use the concept of time, it prioritizes the rows that have had more changes made to them.

If this row has a higher column version, it would prioritize this node. Just to play, just to see it in action because it's easier when it's more concrete. Just creating a table and inserting values into it. It has this virtual table where you can see the views. It also has something called causal length, the cl in the table. That also tracks how many times the row has been deleted and re-inserted, which is called like a resurrection in CR-SQLite terms. Just to post it here, which is what I'm getting at. Each time something is inserted into this table, this second table called crsql_changes is updated. It tracks the change to each primary key. This primary key has been encoded. The primary key 1a, because a is the primary key in this table, has been encoded. It uses the primary keys as the keys in a map, and then it tracks each column. It tracks it at a granular column level. Basically, it gives you this functionality, and we've built something over it to make it easy to communicate within nodes.

If you were accepting changes from a different node, you would insert it into this table. Before the insert happens, it would do this comparison and that is how it determines which node wins. Just to see, the column b has been updated. Each column version has changed, is updated to 2. I'm trying to just fast forward to where I made some changes, just to see how the decision goes. Basically, I'm trying to pretend, sort of running a statement where I'm inserting changes into the CR-SQL table. The site_id is an identifier for the node. I'm trying to change up the site_id so it's from a different node. I've deleted the row a few times. You can see the cl is 3. If you've resurrected, as you've deleted a row, inserted it, you would have a higher causal length value. Basically, it compares causal length first, then column version. There are situations where this might be the same thing. Then it will compare the actual values. If you have an integer as the value of the column, the higher value would win.

Lastly, we use the site_id as a tiebreaker. This is just inserting things from another node inside here. I'm just tweaking things so it's not the exact same site_id. For what I'm inserting, it has the same column version. I was trying to insert something, but the column version is 1. The second to the last value is 1. You can see that even though I've inserted something into the table, nothing changed because with the CR-SQL algorithm, it's basically rejected the change, or not rejected, you'd say this change one. If we tweak the causal length to be much higher, say 7, usually 5. You can see that the one from the different node has 1.

If you check the actual table, you'd see that the change has been made. Just to give you an idea of how conflicts are resolved within tables. Basically, it has triggers. On insert, update, delete, it would do this comparison first. It's very possible that you would make a change that would not win. This is just tweaking stuff a bit, just to show more. This is basically the pattern. You can see that because we've inserted a winning change into crsql_changes, the information in the table itself has changed. This is basically how CR-SQL works under the hood.

Steps in Corrosion

Corrosion is basically a wrapper around this. People using Corrosion don't really need to know about the crsql_changes. That's what happens within Corrosion. You make changes to the table using the usual SQL statements, inserts, updates, deletes, and those changes generate the rows in the internal tables, crsql_changes. Then we broadcast those changes to other nodes. When they apply each of them, regardless of the order, if it receives an old change that has an older version, it won't win. It would basically just still retain the newest version that it has. Modifications to the table generates crsql_changes, and each of those changes gets an internal version attached to them. This is basically how Corrosion tracks changes and communicates within each other.

First of all, we prioritize like a local first. We broadcast the change to the nodes closest to us. We broadcast it a certain number of times, and those nodes will also broadcast to others. This is a gossip-based dissemination style. It's been written about a lot. Just a small animation to show you how the information spreads within the cluster. We send it to three nodes, and then those three nodes send it to three other nodes. It's also called like an infection style, it's coined from how infections spread. You infect two people, and they infect four other people, and next thing, everybody is sick. That's basically the protocol that we use when broadcasting changes. We broadcast it a certain number of times. The selection of nodes is random. We broadcast it first to all the close nodes, but we'll still broadcast it a certain number of times. This is to ensure that every node in the cluster gets the change, up to a certain limit. If a change has been broadcast a certain number of times, at some point, we would want to stop.

If you look at this, one node broadcasts into three others, and then forms a graph. You broadcast it log n times. Not exactly log n, but something times log n. There's some mathematics behind it to ensure that if you broadcasted and changed this number of times, you get across to all the members of the cluster. Another thing is, because it has a broadcast mechanism, let's say we are overwhelmed with changes. We tend to prioritize those that have been broadcast less because they are yet to spread as far as the others. Let's say there were a lot of changes, maybe due to some issues, CPU, whatever. We're not just broadcasting as fast as we should. We just have a backlog of the queue, we prioritize changes that have been broadcasted less. Those ones get out too.

Broadcasting is not our only mechanisms for getting changes out. We also have a sync protocol between nodes, because the last level of nodes, we get it slightly later. We are prioritizing speed. Periodically, the nodes would initiate syncs with each other. Like I said earlier, Corrosion will attach a version to each change. It has its own version that it's tracking, it's tracking each change. This node will say, I have version 1 to 5, what do you have? It will initiate syncs with n nodes, and each of those nodes will send it their own states, and this is what data they currently have.

Then it will look at their state and say, this node has version 1 to 5, and I have just version 1, that means I'm missing 2 to 5. It would request what it has found out that it needs basically. It would do a diff between their states and send requests for whichever is missing. Broadcasting alone, the messages could get dropped, a node could also be offline, really, maybe it's down. It's just really not even available to receive any of the changes during that time. Even if we have a lagging node that comes up, it would basically start syncing with every node at intervals until it comes up to date. The sync is constant. We initiate it periodically.

Just a bit more on subscriptions that we've talked about earlier. We basically provide a HTTP API and you can basically send a query, and we will send you updates, as there are changes to the data that would have been returned to that query, we'd send updates. Basically, it helps so that clients of Corrosion don't have to keep reading from the database. Because CR-SQL, the changes that we make to the databases, you can read straight from using normal CR-SQL statements, but just to make it easier, we also provide an API where we can send updates. Because when we receive data from other nodes, if you see the row, the PK is like a compacted primary key.

Basically, even when we receive data from other nodes, we know what primary key has changed. We just put that data in a struct and send it out. The CR-SQL internal table has the table name, the primary keys, the particular column that the change is for. We could just look at this and then we run a modified version of the query for primary keys. It will be faster because we are checking for particular primary keys that we know has changed. This is what it looks like. We send first the column, then rows. This is initial query. This is the first update. The row is initial data, then eoq means end of query, it's done.

Subsequently, we send you updates for each one that has changed, like deletes, updates, inserts. You can see how that could be convenient. We also do just updates, basically where we just send you what has changed. Let's say there's a cache and it just needs to know what has changed so that it knows when to read from database. Here we don't really send updates about the data itself. We just say, this primary key, this is very light to do because we know what primary keys have changed already. We can tell if it's a delete or an upsert by looking at the causal length. When the causal length is even, because at first it's 1, and if you delete the row, it becomes 2.

Basically, deletes are even. We can, with the data that we've received, check if it's even or odd, and we would know if it's a delete or an upsert. This can just be a notification for it to go read the database. It doesn't need to just do it randomly. It can do it exactly when the data has changed. Just a bit about security within the cluster. We provide this TLS command for you to secure communications between each of them. Each would have the same certificate authority, and then you can generate certificates for the server.

Production at Fly

Just a bit on how this is running at Fly. It's running on about 800 physical servers. The subscriptions are used by the proxy basically to get updates on machines that have changed, health checks, all of that. It is not the source of truth. What that means is that the data that comes into Corrosion is already stored somewhere else and then syncs to Corrosion, and Corrosion gets it everywhere. Like the flyd which is the component that manages machines on a node has its own database, and then it would still write to Corrosion. Basically, if something happens, we can sync from the source of truth. It's a weird way. Corrosion works normally but we've just evolved to use it like this. There's also minimal conflict with the data because different components own different parts.

For something like the machines table, like where we store information about the machines, each node owns its machines and each node will only update the data about its own machine. There's minimal conflict resolution going on. The thing with CRDTs is that it can be surprising. We don't have a lot of surprises because there's not a lot of things competing over the same row in a database. That applies to most of the tables. There's just like a component that manages this, and is solely making the changes there. We have a lot of changes going on. It's in heavy use. Just the graph, the p99 is under a second. It's pretty fast. It's the syncing and broadcasting, there's always a lot of chatter going on and nodes are mostly up to date and within a reasonable amount of time. It's working well for us.

Caveats

Just some caveats, it's easy to shout about the good parts. We are bound by cr-sql constraints. It only works for tables with primary keys. If, for some reason, you don't need a primary key in your table, which is not very often, Corrosion wouldn't work for you. If you have a unique constraint on a column that is not the primary key, also wouldn't work. This is central to how cr-sql works because it uses the primary keys to track changes. If you have no primary key in your table, it doesn't know what to do. On-the-fly updates work very well, but destructive changes, not so much. We have an eye on this out because when the schema changes we still might not have communicated it to the other node. It could be requesting for changes that have effectively gone. Destructive changes are currently not allowed. We don't have any authorization or authentication. We do not authenticate who is accessing the HTTP API. There's currently no inbuilt mechanism for that. That's just something of note.

Lessons Learned

Just some lessons learned. First of all, we started from Consul. Then we had this thing that was still bound to Consul. It pulled from Consul and put this into SQLite. We had to take a step back and think of, what are we actually trying to achieve? Even though we were using Consul and it was working for us. Because we had started with Consul, we were thinking of the problem from that angle. We had to step back. Like, what are we trying to achieve? We want to be able to read and write data pretty quickly. Is there another easy way for us to do this? I didn't talk about this in the slides. Corrosion also has what we call Corrosion Consul. Because the services API has a particular schema, Consul has services and checks. You can pull their API to get updates on that. Basically, what Corrosion Consul does is it pulls the data from Consul, but this time, locally. It doesn't need to pull the central data because each node pulls its own local agent. This is basically the evolution of the problem. It's like, it pulls its own local agent. It's not reaching out to the cluster in IAD, and then inserts it into the SQLite database.

Then that gets replicated to every node. That's basically how we solved the original problem of getting data from Consul. Even though right now we do not use Consul for health check. We've phased that out. We still had a period of time where Consul is still being used at Fly but it's not used for health checking and storing data. We basically had this period of time where Consul was still being used for health check. We solved this by having Corrosion pull the Consul HTTP API, which is what attache did. The difference is now we're just pulling only the local agent for its own service.

By the time every node is pulling its local agent, which is quicker, it doesn't have all those lots of bandwidth, and then it inserts it into Corrosion. Corrosion would then disseminate that information around the cluster, so every node would eventually have a global view of the services in the cluster. Something else that you need to be aware of, especially when dealing with a gossip-based system is something I call the broadcast storm. Which is when your cluster, probably due to an edge case, doesn't stop propagating changes. We had this incident with Corrosion, because some changes failed to apply, we had a cache, just like a map that just marks like, we broadcasted this change before. That map signifies applied changes. When a change failed to apply, we would take it out of the map. We received some changes, broadcasted it, put it into the map so we know that we have broadcasted it.

Then, we'll try to apply. Then applying those changes failed. We'll remove it from the map. The change might come in again, because we select nodes randomly for broadcasting. A node can be selected twice. You could keep getting the same number of changes, because the selection is basically randomized. We don't know what node has already seen the change. We just select n. The broadcast contains the number of times it's been broadcasted. It just has a counter. Basically, when you receive the change, you increment the counter, select n nodes, broadcast again. When it receives the change again, it's no longer in the map because it failed to apply it. It would broadcast it again. We basically had this situation where the nodes just kept broadcasting things to each other, flooding the network links. When you're dealing with gossip-based systems, it's always good to be careful about what you're broadcasting. You're careful about the point you should stop broadcasting it and how you know something has already been broadcasted.

Another thing is, don't forget the delay. As much as Corrosion is quick, I think occasionally it still catches people by surprise when they're like, I inserted this thing into the SQLite DB and it's not instantly available. I think eventually consistent systems are slightly harder to reason about. Build your systems to be resilient to the delay. For the proxy, it could receive a request for a destroyed machine, but it hasn't gotten that update yet. When it routes it to the worker node, the worker node, that's where the change is originating from. It knows that the machine is no longer in existence. It would basically say, I didn't find this machine.

The proxy knows to route it to another healthy machine. You also have to build your systems so they still make optimal decisions when they have stale data. Corrosion is relentless. If you insert bad data into Corrosion, it would quickly disseminate it. You could delete it, but it's just like something to be aware of. You can't have a table in the Corrosion schema. You can have a table in the SQLite database that's not managed by Corrosion. It disseminates everything.

Resources

Corrosion is open source. You could take a look at it. It's written in Rust. Rust is becoming the go-to systems language, because of its memory safety and how much the compiler helps you to write the right thing. It's open source. You can give it a look if you're curious. A lot of what Corrosion does is not particularly new. Things have been written about it. We use the SWIM protocol for membership, which is basically how we track what nodes basically join and leave the cluster. Corrosion will keep going as nodes join and nodes leave. It just needs to take care of that. If it thinks a node is down, it won't select it for broadcasting. We use SWIM protocol for membership management, and CR-SQLite, CRDTs, all of these are just interesting parts of Corrosion, and you can take a look at them if you're curious.

Questions and Answers

Participant 1: You mentioned this gossip algorithm and synchronization between the nodes, what happens if the node gets outdated? For instance, it's not active for some time.

Somtochi Onyekwere: Basically, it would be stale. It would have the wrong data. Whenever it comes online it would start syncing again. When the node is outdated maybe due to networking issues, other nodes would probably notice. It would say, this node is down. We can't depend on it to keep broadcasting the rest of the changes. What happens over time actually is that the node is just out of date. The cluster would keep going on, but whatever is trying to read data from that node would get an outdated data. We actually alert on this. We check the deviation between the number of rows on different nodes and alert on this. This is something you have to take care of.

Corrosion wouldn't really scream if a node is behind. It just leaves it. It'll just be like, ok, we'll keep matching on. It's actually maybe a caveat of Corrosion because whatever is depending on the data in that node would be actually out of date. Whenever it comes online, for whatever reason, because normal operation, it's supposed to be syncing, it's supposed to be broadcasting. Maybe there's a networking issue or Corrosion is just suffering on the node, maybe because the node is overloaded. Whenever that heals, automatically it would send a ping out, other nodes would know that it's back. It would start syncing with other nodes. It would lag behind, but it would also be able to recover on its own soon enough, if it's not for an extended period of time. Even if it's an extended, it will still be syncing, it just might take a while to catch up.

Participant 2: If data is synchronized based on a columnar version, how does it work across transactions? Like, if I update a few columns at the same time, is it possible that it will synchronize different nodes, partial transactions?

Somtochi Onyekwere: You're making a transaction to the local node. You would make it to local Corrosion. Because it's in a transaction, the change to cr-sql would only happen when the transaction succeeds. If the transaction is rolled back, maybe there's something, cr-sql won't have the backup rows, basically. When you make the change to the local node, if the transaction succeeds, it would have those changes. If it didn't, it won't.

Participant 2: If I have them, is it possible that other nodes will only see part of that.

Somtochi Onyekwere: Yes. Maybe it gets the changes for one table before the changes for the other. We do not recognize the transaction within Corrosion. We do, because we tag it with our internal version. We propagate all the changes at once. They would all have the same internal version. When a node says, give me version 2, we would send you everything which would have the same version because they are within the same transaction. That's basically how we work around that. There's no in-built cr-sql transaction, but just by virtue of how Corrosion communicates, if you say, give me version 2, the version 2 would be tagged, like all the rows within that transaction would have the same version within Corrosion.

Participant 2: Even across tables.

Somtochi Onyekwere: Yes, even across tables. The version is different from the tabular one. We would batch all the related changes to one version together and send it across.

Participant 3: How do you approach testing this?

Somtochi Onyekwere: We have integration and unit tests. We have integration tests that starts off Corrosion. All our tests are also open source. We just have integration tests, insert data into different nodes. We just start off different processes that mimics different nodes. We're actually looking into testing this more, but our tests are like the normal tests you would expect. We start off different processes, insert into both of them, let them sync.

After some time, check that they've synced and they have the same data at the end of it. Asides testing a lot of the other smaller components like the unit tests, are we incrementing correctly? Are we checking gaps? The greatest version I know about a node is version 80, if I receive 100, are we tracking the gap? I have a gap of 81 to, are we tracking the gap correctly? Those small ones have their own unit tests, but we also have integration tests. We're looking into testing it more, so you can check it out too.

Participant 4: You mentioned that you're still using Consul to do the health checks. Do you plan to actually retire this part?

Somtochi Onyekwere: Yes, it actually has been retired. That ran for a while. It was just retired recently. We now have something local on the node that checks the health, because the machines are already running locally on the nodes. The new components still insert into Corrosion, and that's how that data gets everywhere.

Participant 4: If you're retiring Consul, do you plan to offer the DNS interface so that you can also do service discovery via DNS on Corrosion?

Somtochi Onyekwere: It's not in the roadmap right now. We have a DNS component that also depends on the data for Corrosion. I could see that working, but it's deeply tied to the schema of our tables. Corrosion by default doesn't really come with schema. You define the schema yourself, so it's very flexible like that. Whatever you want to insert, whatever your table looks like, it would automatically accept statements into that table and disseminate the data.

The components that we have, it's wired to basically how our data is set up, because we have something called .internal. You can say your app name.internal, and we would give you IPs of your app. It's looking at Corrosion's data, but it's deeply tied. We don't have some offering right now. If you want to disseminate Consul data quickly, because Consul's API is fixed, so that part of Corrosion would create the services table, checks table, and insert that. If you have an SQLite database that you've been meaning to scale and trying to turn it into a distributed application, Corrosion is something that works very well. We don't have that DNS feature yet.

See more presentations with transcripts

Recorded at:

Jan 09, 2026

Somtochi Onyekwere

InfoQ Software Architects' Newsletter