00:19:38 video length
Bio Damien Katz has worked for Lotus, MySQL, IBM, and is the creator of CouchDB. Damien will be doing this for a very long time to come.
I am the creator and project leader of CouchDB. I currently work for IBM. Before that I worked for MySQL and before that I worked at IBM again on a project called Lotus Notes.
So, it's most like Lotus Notes because I worked so many years on Lotus Notes, I got a really good grasp on Lotus Notes' whole platform, and what is actually good about it, there were a whole lot of crap piled out on Lotus Notes and a lot of people really disliked it, but it's been successful for a reason. It's been around for a long time and it still got like a hundred million users. So there is something there and I felt like I had a pretty good idea of the core of Lotus Notes what was actually powerful about that, so that's what I tried to extract down and make it in CouchDB. It was that document model. So it definitely works most like Lotus Notes. XML databases where you have these very, very, large single XML documents, they aren't really quite the same model, so I don't know. I haven't used them that much but I know they are kind of different.
Yes. Generally each document won't have a random structure. The documents will have some sort of predefined structures but it's not enforced by the database, being enforced by the application layer. Eventually we would have hooks into the database where you can enforce when documents are saved that they adhere to a specific format or schema. But generally speaking the documents don't have to follow any sort of schema, it's the application. So it gives you a lot of flexibility, how you want to display the data. If you want to display a bunch of comments, for example you have a discussion database and you want to display the main topics, or you want to display the comments, or you want to display all the documents by a certain user, it's really easy to do that with CouchDB.
It runs over all the documents, but it keeps a persisted index so that every time you run it, it just has to use the index. When documents are updated, five documents are updated, it doesn't have to go run over all the documents in the database to recreate this index. Instead, it just figures out the documents that have changed, and recomputes the results, eliminates the old results and adjusts the index and then you can query it. All that happens automatically, all you do is you create the view definition and you access it, and CouchDB handles it, doing all this things for you.
Yes, so the original versions of CouchDB were written in C++, and I kind of hit the wall. I had a storage engine, I had a view engine and I had a query language, that I had written in C++ and I hit the wall with the concurrency issues. So, I always had to do conventional threading with locks and messaging and things like that. And I read about Erlang on Lambda the Ultimate or something like that it has been a really good concurrent language, so I decided I was going to figure how I could integrate that with my code base. So I played around with it, downloaded it and it didn't take long before I just decided that it was perfectly suited for writing a database engine and server and I threw away all my C and C++ code and rewrote everything in Erlang and it's been fantastically productive for that. It's excellent for infrastructure type stuff, it's designed for Telecom. Telecom has a lot of the same issues that you have with databases, lots of input, output, has to be reliable, has to deal with failure gracefully. So, it ended up being a perfect language for that.
Definitely for concurrency. Somebody did some early benchmarks on CouchDB and they would probably get twenty thousands simultaneous connections. That was pretty impressive. And we haven't even done any profiling yet. Definitely Erlang helps us in that area. If I had written this using conventional threading model, you were lucky to get five hundred active connections, so definitely Erlang helps with single machine scalability. Erlang will also help with multi machine scalability but we are not really using Erlang for that yet, but it has a whole lot of tools and libraries and things like that to allow for multi machine Erlang environments for automated fail over and efficient messaging and things like that. And we just haven't taken advantage of it yet.
I hadn't profiled it, I think that the person who profiled it, profiled it with the multi processor around, so I don't think anybody actually profiled it with the single processing version of Erlang.
Yes, most of those complaints are still there, but every language has things that you dislike about it. Some of the things in Erlang are old. If you are designing it to modern day, you wouldn't have made these decisions and some of it is kind of inherent to the programming paradigm like you can't fix that without breaking other things that are very right with Erlang, so there are things that are frustrating about it. It has very poor string handling, something that I think could be improved. But if your problems don't fit well into the functional paradigm, then maybe you should just use a different language.
I think string handling right now is the issue that is always there. It's really inefficient right now, so not only has it been cumbersome doing a lot of the string stuff that, if you are using a language like Ruby or Python, would be much easier, much cleaner, so not only was it harder to write code out there, it's also slower. So yes, that's an issue. But we are also trying to address that, we are trying to use a different style of strings where each string is a element in a list and it actually ends up taking sixteen bytes just to store a single character. And then it has binary strings which are more like conventional strings in other programming languages, the syntax for those is ugly of course, but I think we are going to switch over to them anyway because it's way more efficient.
A good analogy I would like to use, as an exercise for you to figure out what's a god application for a document database. If you weren't doing this application, if it weren't computerize, how would you do it in the real world? And if it ends up being lots of pieces of paper that are filed away and passed around to different people, that's a really good indication that a document database is the right place. If it ends up being the kind of problem like a pain program or single spreadsheet, it's all locked down, so for an accountant, what do they call them? They are spreadsheets on paper, those are the things that can't be split up, they have to be a single document. In those cases that means that it probably should be a single application but if you have a bunch of these documents they are constantly getting spread around like to-do lists, bug list, customer complaints, these are the kind of things that in the real world they would be generating stacks of paper. And that's when you should really start to consider maybe a relational database is the ideal place and maybe a document database is, and with CouchDB's nature where you can actually take the documents with you offline and then edit them and then later when you are online replicate the changes back, that's definitely something that is very difficult to do with a relational database. So any time you need that offline capability to access your data and edit your data, that's when a document database like CouchDB would really excel.
Yes, Lotus is still extensively used, for people a lot of time think of Lotus as just an email platform but it's actually an application development platform for documents.
It started of that way and one of the first applications built on top of it as being a document oriented database was email. So it is email and that's one of its top applications, but it's just another application in the Lotus Notes stack. But Lotus also has all these other applications so you can do bug tracking and customer reports and CRM type stuff and it's used extensively by a lot of companies for that sort of thing.
17. So would you say that this document database concept, this has been around for some time with Lotus Notes, this has been available in other products, or it seams to be becoming popular with CouchDB.
Exchange wanted to be something like that a long time ago, they had the Exchange server, they had this concept with shared folders where you were supposed to build the applications on that. And that never really worked out, nobody really used it for that, so there have been other attempts to build things like that and then of course anything on the web like a SharePoint works very much like Lotus Notes, but it's a single instance web server of Lotus Notes and you use your browser client, but it's still doing a lot of the same things, it still is very document oriented type of environment. So there have been other things but Lotus Notes is in my mind the only thing that really got it right. Even though it got a whole lot wrong, in addition to that. I always thought that Lotus Notes was still kind of unique in the market place and that is why I really wanted to build CouchDB because I thought that that model had been under appreciated and under explored.
18. You want to bring that document model to the open source. It seams to be a lot of other paradigms for databases as a competition to a relational databases like Google's BigTable. What do you think about that?
BigTable - I don't really quite get the benefits of it, other than scalability. I definitely see the benefits in that way but for most applications that people want to build, they don't need that sort of scalability. It's kind of a limited platform but I haven't actually used it, so I don't know, I just read some of the complaints.
I do think it's a different model, they don't have the same view model. The really key to CouchDB is the view model where you can create these views that are generated to index all your data. And BigTable is just this big key- value table store and not really sure that's powerful enough to build interesting applications.
Lambda the Ultimate