BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Interview with RavenDB Founder Oren Eini

Interview with RavenDB Founder Oren Eini

Bookmarks

Key Takeaways

  • RavenDB is a database created out of significant need by its founder Oren Eini
  • The benefits of RavenDB include dozens of hidden gems such as paging within a single query
  • Understanding the beat of the code is essential for technical CEOs to make informed decisions on a cost benefit analysis.
  • The RavenDB team sets a high bar before releasing support for features
  • Reviewing and understanding the source code of others is a powerful tool for learning and inspiration

RavenDB is a NoSQL document database with multi-document ACID transactions and smart document compression. To learn more about the recent RavenDB 5 release and RavenDB in general, we’ve invited Oren Eini, creator of RavenDB and CEO of Hibernating Rhinos to join us.

InfoQ: Thanks for joining us Oren. RavenDB has been around for a while, but in many ways, it feels like it is really starting to emerge in relevance. Thinking back to the early days of RavenDB, what first motivated you to start the project?

Eini: I started out as a consultant, going from client to client and fixing their SQL DB performance issues. It got to the point where I was doing the same thing every day, with different clients. They all had to face the same issues, mostly related to the database structure and working with complex domain models in a relational database.

The time was 2007 or thereabout and I decided to take a look outside the relational pool and see what was going on there. I don’t know if you recall, but NoSQL databases were just starting out and they were complicated beasts that required very careful orchestration to work properly.

I compare that to juggling knives. If everything works, this is awesome. If that isn’t the case, you are asking someone else to pick your fingers off the floor. 

RavenDB came about because I wanted to have a database that would fit the mold for building non trivial business applications and wouldn’t decapitate me if I blink. 

There are a lot of things in RavenDB that have made a lot of difference in how you build and work with your applications. Automatic indexes, easy domain modeling, zero admin mode, etc. 

InfoQ: You are definitely not alone in the perspective of starting something to solve a need you faced as a consultant and growing it into something bigger. This is very much how I got started with Dojo as well!

From your perspective, what are some of the key use cases where RavenDB really excels?

Eini: It is a great database for business applications, for Online Transaction Processing (OLTP). That is what it was made for and it works well for this purpose. There are a lot of small things in the design that make things easier all around. For example, if you want to do paging, you’ll typically need to do two queries to the database, one to get the current page and another to get the total count. RavenDB allows you to do that in a single query. 

It is such a small feature, but we have dozens of things like that, and the mass of them make it easier to just get working. 

RavenDB is also a distributed database. You can deploy it as a single node or in a cluster, and everything will work together. Furthermore, it is also able to work as independently collaborating nodes, which has some really interesting implications for some systems. If you need to build integration between multiple locations, for example, RavenDB is a very natural fit.

InfoQ: With RavenDB starting out in the .NET ecosystem, it is now rather friendly and easy to use for JavaScript developers. Could you explain what JavaScript support is like within the various layers of RavenDB, and what specific flavors of JavaScript work within RavenDB?

Eini: RavenDB works with JSON documents, so using JavaScript is a very natural way to work with the database. There are a few ways that you can work with JavaScript in RavenDB. 

RavenDB has a JS interpreter built-in (supporting ECMAScript 5.1 and large parts of 6) which can be used in queries and in patch operations.

That gives you a lot of freedom to express what you want and apply logic on the database server.

We also have a fully-fledged client for Node.js and TypeScript, which means that you have really good support for working with your documents from the client-side. That includes Unit of Work, change tracking, automatic serialization / deserialization, etc.

When working in Node.js in particular, using RavenDB is very easy because the same model applies from the storage level all the way to the UI.

InfoQ: I was a super early adopter of TypeScript (arguably too early!). I think it’s super helpful that your Node.js client is authored in TypeScript. Was that always the case or did TypeScript support come later in the evolution of the RavenDB Node.js client? Are there any particularly memorable benefits using TypeScript has provided to RavenDB and its users beyond the obvious TypeScript benefits?

Eini: Our first version of the RavenDB Client for Node.js was in TypeScript. To be honest, I have no idea how we would be able to write something of that magnitude without a proper compiler to lean on.

The Node.js client for RavenDB is currently at around 45,000 lines of codes and has parity with our other clients. 

I have to call out explicitly the async/await in particular that helped a lot when building the code. But the fact that we had type checking and were able to utilize the compiler made it possible to grow to that size.

InfoQ: Certainly a lot has changed over the history of RavenDB. What are some of the biggest changes that have occurred from your perspective?

Eini: In 2015, we reached a milestone in RavenDB, we were deployed at tens of thousands of locations across many customers, but we realized that our current architecture limits us severely in meeting our goals.

We decided to effectively rebuild RavenDB from the ground up, to take advantage of all the experience we have of running RavenDB for half a decade. 

It took us close to three years to complete this work, but the result blew our minds. We have a much better product, a minimum of ten times faster across every single benchmark we have and a much better foundation to build on.

That was a hugely risky decision, but I think that it paid off very nicely.

InfoQ: Definitely risky yet very familiar! With Dojo we’ve done three major rewrites over the year, the most significant being pre-2.0 to 2.0 which was a full reboot of the project from a “solve everything” JavaScript toolkit to a standards-aligned, progressive, reactive TypeScript framework.

The migration for Dojo users with such a significant change was rather painful. What was the migration like at the time for end users of RavenDB switching from significant amounts of client-side work to the server handling more of the details?

Eini: In retrospect, we messed that up and I would do things differently. We made the decision early on to break compatibility with older versions. Doing that for the file version was reasonable, but we also did that for the wire protocol. And that was a mistake.

It allowed us a lot more freedom and flexibility and gave us the ability to get to a much higher tier of performance. But it also significantly slowed down adoption of the new version and made it a much harder task than it should be.

Looking back, I think that it was fine to change the on disk format, since we already provided mechanisms to upgrade that without work for the users. However, we should have built an adapter layer of some kind that would allow clients to move to the new server without needing to upgrade the clients.

That would have made the transition much easier. 

InfoQ: Knowing what you know now, what do you wish you had done differently in your journey from the early days of RavenDB to today?

Eini:  I mentioned the re-architecture of RavenDB, so that is obviously something that I would do differently.

I think that the one thing that I would do differently and I had the capacity to do from the start is not to use REST as the underlying transport mechanism. Thinking back, if I were to use the MySQL or Postgres protocols for communication with RavenDB, I would be in a much better position, since I wouldn’t need to write a client API.

Because we had to write a client API, a lot of functionality ended up in the client. That is a good thing, because it means that you have a world class client-side experience, but it also means that we have to port that functionality to every new environment we support.

As part of the re-architecture of RavenDB, we moved a lot of functionality to the server-side, because it made a lot more sense and reduced the complexity of developing each new client.

InfoQ: There’s a lot of popularity in the JavaScript ecosystem currently with GraphQL and perhaps a healthy amount of misunderstanding in how it relates to REST. What are your thoughts on GraphQL today?

Eini: GraphQL is an interesting way to expose your data directly to your clients. It has the advantage of being simple and not too complicated. That is great because it prevents your clients from sending queries that can be very expensive to compute (the usual problem when you open up a database directly for users’ queries).

The advantages of using GraphQL over REST is that there is a well defined format and you can take your expertise from one system to another. 

The downside of GraphQL is that this is a format that is best suited for exposing data to external users, not something that you want to use as the query mechanism inside your own systems. Accepting the limitations of GraphQL for your own use between different components that are owned by the same project is a mistake, because you don’t need those limitations. You control both ends of the system, so you can create a proper tailored solution for the queries you need.

InfoQ: I’ve always thought of myself as a very technical CEO, but from what I’ve observed you’re even more so! I have a pretty amazing story to share where I learned this first hand by having a call with you recently. We were discussing some challenges our team was having with RavenDB and you immediately dove in and demonstrated solutions in code. It was late at night for you and it was super helpful and impressive. How do you find a healthy balance between the needs of engineering your product and the needs of running a growing company?

Eini: I initially created my company because I wanted to work on cool stuff and get paid for it. That was immediately after I had a major project where I had very limited control on the software and architecture that were chosen and that went… poorly.

Leaving the office at 2AM for weeks on end kind of poorly.

InfoQ: I once pulled an all-nighter, slept two hours, and then pulled another all-nighter. After that, I made a very similar decision!

Eini: Working for myself was my way of ensuring that if I leave the office at 2AM, it means that I either am really enjoying myself or I’m paying for my own decisions. 

I love what I’m doing, and being able to work on RavenDB has been an absolute blast.

At the same time, as the company grew, I had to take on a more managerial role. It actually translates really well into numbers. 

If you look at the raw metrics, I wrote more code in RavenDB than the other top 10 team members combined. That means that I have a lot of insight into exactly how it works. 

But statistics lie. That only works if you look at the full history of the codebase (and likely only because I was doing the merges for a long time). If you look at the last three years, I’m only the second contributor in terms of lines of code and I’m not even in the top 5 if you look at the last two years.

InfoQ: In the early days of Dojo I used to get a lot of credit for what was effectively being a human linter before linting tools existed!

Eini: That corresponds to the growth of the company and me putting more time into the strategy of the company and the high level architecture of the product. I still write code that goes into RavenDB on a weekly basis, and I’m involved in the large scale architectural work. But I’m no longer able to keep up with all day-to-day changes in the code.

The good thing is that we have a really good team and a good culture in terms of how we express ourselves in code and in its history. That means that if I land in a piece of code that I’m not familiar with, I can still make sense of it and figure out what is going on.

Initially, I have to admit, that was for other people, because I knew the code. But it turns out that turnabout is fair play and it also works the other way around. 

I also cheat, I have to admit. As you are aware, 90% of the work is required to cover the edge cases that happen in 0.005% of the cases. So having a good grasp on the thing that usually happens means that I can get away without needing to always dig into the code to understand what is going on.

InfoQ: The ability to cheat is really very undervalued. Before the days of TypeScript and auto-complete or intellisense for JavaScript, new engineers would ask me how I knew what the various properties and methods were in Dojo and JavaScript. My answer was that I just remembered ALL of them. This was effectively cheating by being there first and just being able to cheat by storing all the details in my mental cache.

Eini: There is also a great value in not needing to come into something cold. We have been building the RavenDB codebase for longer than a decade, there are rarely any surprises there for me.

As for balance, one of the things that I try to ensure will happen is that I’m working on the RavenDB codebase on a regular basis. I want to be sure that as the CEO, I get the… beat of the code. If we adopt a particular pattern or a tool, it has to be something that would pass my sensitivity to annoying things. 

I found that it is easy to mandate Tool X for Reasons, when you are the one seeing the pretty graphs at one end of the spectrum. When you are the one that needs to do extra work to make it happen, there is a much higher barrier for accepting that, so the value needs to be there. 

A good example of that is branching strategy. If you have multiple branches that you need to apply a fix for, who does that, and how does this work?

At one point we supported 4 different versions of RavenDB, and solving an issue meant that the end developer had to sometimes do 6 times the work (because you had to test and adapt on all those versions).

This is something that had to be dealt with at the CEO level, because deciding when the cost of supporting a version is too high is not something that the developer writing the code can do. 

To be perfectly honest, I would work on the code regardless, I love to code and in some respect the growth of the company has pushed me away from my hobby, so I make time to open the IDE and write useful code. 

InfoQ: Speaking of strategy, while you’ve just released RavenDB 5, I’m sure you are already planning for what’s next. What interesting next steps would you like to share with InfoQ?

Eini: That is an interesting question, because in many cases, I have to make a bet on what will be important a year or two down the line. We have a roadmap for RavenDB, but we adjust it based on what we see in the market and what customers are telling us.

There are a few things that are on our roadmap that I am really looking forward to. For example, in RavenDB 5.1 we are going to come with replication support in Byzantine networks. This is useful when you have RavenDB nodes deployed in an environment where you don’t trust the remote nodes. A good example is when you need to integrate with a RavenDB instance that is running on a user’s machine, and you want to allow that user’s RavenDB instance access to some of the data in the cloud.

That allows you to build systems that use RavenDB and collaborate, without needing to trust the remote locations. And conversely, the remote location doesn’t need to trust you. This will allow RavenDB to take on itself the role of synchronization between these locations. That is important because it frees the developers from having to write all the (decidedly non trivial) code to handle that and make things Just Work.

Other (very) long term goals we have is to improve our searching / indexing capabilities. That is a Big Issue and something that we keep deferring because the size of the task is very large. We expect to see another major performance boost from this, but the scope is huge.

Another example of something that we want but need to consider is sharding. This is something that is currently not implemented server-side by RavenDB and we mostly see it as a feature that users ask for in order to satisfy the checklist. RavenDB is running on multi-TB datasets with no issue and has support to do load balancing across the cluster safely and easily.

Sharding is a complex topic and we have a high bar for what we consider a feature complete. The most major issue is that this is something that forces you to think in advance about how to build your system and that is usually not something you need early on. 

Splitting your system so you are using microservices each with its own independent database is much better, from an architecture perspective. It also means that you don’t have to pay the complexity cost of sharding, which is quite high. 

The downside is that we have possible users that need sharding for a checklist and will skip us otherwise. 

InfoQ: RavenDB has a mixture of licensing, open-source in some scenarios and commercials in others. Given how some cloud providers leverage databases I think this approach to licensing makes sense. Could you explain the approach to RavenDB’s licensing?

Eini: RavenDB is an open source project, with the entire codebase available on GitHub under the AGPL. At the same time, we consider RavenDB to be a commercial project. If you are using RavenDB in a commercial setting, you’ll get a license. We also provide free licenses for the community as a free tier on our cloud option.

At the same time, you are absolutely free to go into GitHub and run RavenDB on your own without talking to us. 

This duality is intentional, but it is a rare model, so I want to explain it in depth. 

Open Source projects usually have a funding problem. We recently saw that with Mozilla, as a good example. If the code is free, what would you get money for?

There are several models for getting paid via open source. The usual ones are support  / services based and open core. 

I don’t like the support / services model, because I think it creates the wrong incentives for the product. I want to create the absolute best project that I can, and having to consider if we should make the product easier to use (and maybe hurt revenue) is not something that I would like to contemplate.

Open core is also not something that I like, because in many cases you have to put just enough in the open core to be interesting, then lock all the critical parts in the paid version. It is far too common to put security features in the paid only version, which leads to a proliferation of systems that are open to the public Internet and a predictable result. See the recent Meow database wipes for the latest in a long list of issues this causes.

By putting all the code in the open but behaving as a commercial product, I think we do a good job in bringing in the funds needed to maintain RavenDB. It helps that for most of our customers, there isn’t any desire to run an unofficial version of RavenDB. 

It helps being a critical infrastructure component, I guess.

As for direct competition from cloud providers, we saw some changes with MongoDB, Elastic and Redis in response to this issue. One of the strengths of RavenDB is that it is commonly deployed on the edge. Cloud deployments are very common, but many users are running them in conjunction with nodes deployed on physical locations. 

I also think that the fact that we offer RavenDB Cloud option at a competitive price means that we don’t have too much to worry about in this regard. 

InfoQ: Do you have any RavenDB success stories you can share?

Eini: It’s easy for me to say how amazing RavenDB is, but let’s ask Trevor Hunter, CTO at Kobo:

Hunter: Thanks for reaching out! For us, RavenDB has been a real success in a few areas: Development Velocity and Operations. The developers who have ramped up on it seem to love it and are very productive, and the Operations side appreciates the performance, efficiency and reliability. We’ve discovered a few bugs along the way, but the RavenDB team have been super responsive and supportive with very quick turnarounds.

From a developer perspective, both the .Net and Java sides really like the API surface area… it’s very easy to learn and best of all it “guides” very well away from bad practices and anti-patterns (something that other notable client apis do a very poor job of). It does that without dumbing down… the advanced functionality is all there to be plugged into, but it’s not the first thing a new developer sees. The RavenDB test driver is opening up new possibilities for more effective unit tests that do away with the layers of redundant mocks we used to have to use.

In the performance realm, we’re seeing real benefits. We’re coming from Couchbase and in some cases are seeing 10-15x better latency and much better stability under load. Even during our exploratory chaos testing (e.g. forcibly powering off nodes), it really doesn’t compare… taking a node out of a Couchbase cluster used to be a dreaded event that would inevitably lead to some downtime or increased errors, but today we have no fear of taking a RavenDB node down for a disaster recovery test. On top of all this, we’re consolidating a lot of hardware while gaining a lot of performance – our RavenDB hardware footprint (CPU/Memory) is significantly smaller for the same size of dataset as our legacy Couchbase clusters, but our latency, throughput and stability are much better.

Our current usage is to use RavenDB as a private application database (i.e. the database is private to one microservice). We’ll likely expand this to do much more with the subscription functionality and enable many more event-driven use cases as we continue to expand our footprint. We hope to see RavenDB become more compatible and support some of the common open source monitoring and ops tools (e.g. Grafana, OpenTelemetry).

InfoQ: Do you have any personal habits around development or self-care that you would like to share with our audience?

Eini: I enjoy what I do, and I think that this is key. I keep track of the trends in the industry (although I don’t always agree with them) and I practice my craft. One of the most valuable things that I do is go and read other people’s code. 

I have a game that I play, where I go into what I hope will be an interesting codebase and just read the code in lexical order of the file names. That is absolutely not how you are expected to go into a code base, but it gives me a chance to see some parts of the code before others and speculate about how the rest of the system works. If I’m right, that is a great feeling. If I’m wrong, I learn from it.

I have a bunch of code reviews of these projects in my blog, and I can point to several of them that translated directly into major improvements in how RavenDB works. 

For example, I looked at a bunch of storage engines over the years. LevelDB, CouchDB and LMDB are the most notable of those, but also FASTER and Noise, among others. Being able to see so many different solutions to the same problem was very helpful. In the end, that helped me build Voron, RavenDB’s own storage engine. This is a high performance storage engine library that we wrote specifically for RavenDB.

It was greatly influenced by LMDB’s design, but I also took some parts from LevelDB into account and we ended up with something that can handle millions of queries per second on a single node and is linearly scalable with the number of cores on the machine. Voron is also able to write at about 150,000 items per second on a sustained basis and is one of the primary reasons that RavenDB is as fast as it is. 

When I looked at Noise, a full text search engine in Rust, that influenced the design of RQL, the RavenDB Query Language. 

For that matter, the fact that I read through the CouchDB codebase has directly led me to creating RavenDB. That made me understand that a database is not some magic thing beyond my comprehension, but something that I could actually grasp, and eventually build.

InfoQ: Thank you Oren for taking the time to tell InfoQ and our readers about RavenDB!

About the Interviewee

Oren Eini is creator of RavenDB and CEO of Hibernating Rhinos. Eini is a frequent blogger and has over 15 years of experience in the development world who strongly focuses on the Microsoft and .NET ecosystem. Oren has been recognized and awarded Microsoft's Most Valuable Professional since 2007. He's an internationally acclaimed presenter appearing at DevTeach, JAOO, QCon, Oredev, NDC, Yow! and Progressive.NET conferences, sharing his knowledge via conferences and written works such as "DSLs in Boo: Domain Specific Languages in .NET", published by Manning and now another book in the writings "Inside RavenDB". He remains dedicated and focused on architecture and best practices that promote quality software and zero-friction development.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT