00:41:56 video length
Bio Avi Bryant is the co-CEO of Dabble DB, a Vancouver startup focused on web-based data management and collaboration tools. He is the author of the Seaside web application framework, and is active in the open source Squeak Smalltalk community.
Sure. I am Avi Bryant. I am the co-founder and co-CEO of Dabble DB and I am also the creator of the Seaside framework for Smalltalk. So my time is sort of split between working on the product Dabble and working on the framework and platform Seaside and Squeak that the product is on.
It's quite clear, I mean Seaside came first; Seaside originally came about as a framework to support consulting work that we were doing. And I did consulting with the framework for probably three or four years before starting to work on Dabble, and we formed really sort of move the direction of the company towards only doing Dabble and that's really all our focus right now. But the framework came first.
Most of the work that we were doing was either for clients who really didn't care about the technology they just wanted a solution or for clients who had already made the decision one way or another, to use Seaside or at least to use Smalltalk and so it was a natural fit. So, that wasn't a problem that we ever had in consulting, certainly I know it's something that people are very concerned about with unusual technologies. But really when you are a small company you don't need to find that many clients that are OK with your technology choices. And I think that in some ways it helps if you are in a niche technology because there aren't so many people they can come to and in some ways you may be more likely to get work by choosing a less main stream technology.
Yes, I was a Ruby hacker; this is sort of back in the pre-Rails Ruby days. And an objective C hacker and also getting interested in XP, getting interested in Wikis getting interested in sort lot of things that one way or another I can trace the routes back to Smalltalk and so I decided that I wanted to follow the routes back and see what was there. And really just never left. It was a period where I was doing a lot of exploration in a lot of different languages and that all just kind of the way function collapsed into Smalltalk. And so for the last seven years or so that's pretty much exclusively what I have been working on.
Definitely a new generation and that's really interesting to see and I think that I had a lot of people telling me that having Seaside there, having a good solution for doing web development in Smalltalk has really helped to have a newer generation of Smalltalkers sort of come about. And also to some extend to have people return to Smalltalk who have maybe left for the greener pastures of Java at one point or elsewhere.
Yes, exactly. I gave this talk at Smalltalk solutions a number of years ago where I basically stole the argument that Paul Graham makes at beating the averages, which is that for web applications it really doesn't matter what technology you use. Because the user has no way of knowing. It's not like, you know, Ted [Neward] in the previous interview was saying "Well you can't deploy an application with Squeak because the user has bring up the Squeak application and they have no clue what to make of it" and that would be true if you were actually shipping them the application on Squeak. But of course if you are just writing the application on Squeak and users are interacting with it over a web browser, which is the case for Dabble DB none of our users know that it is Smalltalk, none of our competitor know, none of the people that we talked to. We always get this sort of, it's actually really interesting the reaction that we get, because it's sort of disbelief and sort of amusement but also I think kind of a respect and I mean people really like the fact that people are still using Smalltalk. Because I think a lot of people have these fun memories of Smalltalk as this system that could have been big and wasn't.
Not, necessarily I mean just people who where there, maybe never used it but thought it was an interesting thing. But people never have a negative reaction to it, which is not necessarily something I expected.
Yes, I mean I said I was sort of an objective C hacker at the time and Apple, Next framework was called Web Objects. It's still around although Apple ported it to Java which I think was a mistake. Before Cocoa was really big and before there were a lot of objective C programmers because of that, I think they saw Java as being the future and they ported it to Java and since then I think Objective C has really seen a resurgence because Cocoa has really won as the way of developing application was then and I think they probably wouldn't have done that if they knew what they know now. But that really sort of killed it in a sense, because all of the objective C objects developers who loved it now had to content of it being in Java. But yes, Web objects was what really opened my eyes to the idea of having extreme amounts of session state and extreme amounts of session state in a form of a tree of state full objects that represented part of the web page and that by keeping that around between requests you could keep much more information there and you could have a much higher level of abstraction when you were building a web application than using the kind of more traditional approach, which is to have a small amount of session states other stored on the cookies or stored on the server but have the best majority of the states be passed in the URL be passed in the form parameters and that core idea certainly was taken from web objects.
That's exactly right. So every GUI element in the Seaside application has a listener attached to it and a Smalltalk that means just has a Block attached to it. And when you submit a request, all that happens is that all of the associated event listener with whichever links or form fields you submitted get evaluated.
Well the listener can close over as much state as it needs to. So a block will have any references to any server side state that are necessary and also Seaside maintains for each session sort of a tree of stateful components the same way again that a desktop application would, having a widget tree. And so you got the stateful widget tree attached to the session and the expectation is that what these events are going to trigger probably is changes to the state of that tree, and then you have a phase where you go through and you render the tree again in HTML. So that's really the cycle in Seaside, as you have this tree of widgets, the tree of widgets renders itself, in the act of rendering itself it registers all these listeners, a request comes in and the listener gets triggered and that would change the tree somehow so you render it again with some new page.
Yes, it is very resource intensive in that you have a lot of state associated with each session and that state in general needs to be kept in memory on the web server.
13. Can you mess with the closures with the state in some way? Can you access them in some ways so you can persist them in some other form, for instance to replicate them or sort them in a database, is that even a problem?
No, I mean that is a reasonable question. It's probably useful for me to talk about how Dabble DB works. So the thing to understand about Smalltalk is that Smalltalk is a Virtual Machine and like Virtual Machines like VMWare for example it can snapshot the entire memory state of the running machine. And so in Smalltalk we call these images and so if you save an image to disc, that will have all of the in memory state associated with that process, current execution stack, the current program counter that includes if it's GUI where the mouse was at that point in time, I mean just everything. Just like if you are using Windows on VMWare and then you save it. And so for Dabble DB we have one of these memory images for every customer, for every database. And that includes their data, it includes basically a check out of the code, and it includes all of their session states. And we keep this in memory during the time they are using the application, but we can save it to disk at any point and we do that quite frequently, so if they are idle even for a couple of minutes we basically swap them out to disk, and of course once it is on disk we could migrate it to another server. If that server goes down we could bring it up on a different server so you have sort of fail over and persistence both of the session states and of course of all the other states associated with their database.
Yes, we have very frequent check points as well. Yes, and it's very very fast both to checkpoint and to come back from a checkpoint, I mean basically, almost literally all that happens is that it's a core dump when you are checkpointing, just the entire memory state gets written to disk so that's quite fast. And then you just in mmap it back in the memory when the Virtual Machine comes up. So these VMs spin up with almost no perceptible loop time. And that's why we can get away with just bringing it down whenever possible. So we have on any given server for Dabble DB we'll have thousands and thousands of these images, for thousands of customers who are at any one time mapped to that server but maybe only twenty or thirty of them will be in memory. So typically this would be maybe hundred meg memory images, we'll have twenty or thirty of them so we're using 2-3 gigs of memory.
Yes, Apache is running and so the request comes in to Apache and the way that we have it set up is that each of those images has a particular port that it's configured to listen on. And so when a request comes in, Apache will look at the URL figure out which port it should be proxying the request to, but if nothing is listening on that port it knows it has to spin up the image first. There is a project recently for Rails, written in Ruby, that does similar things and I am trying to remember the name of it.
16. Rack, is it?
No, it's something Switch Pipe maybe. Yes that is right.
Yes, so the Switch Pipe architecture actually is the closest thing I have seen to what we do in Dabble DB. It's not identical but it's fairly similar. So if you think of us as having a Switch Pipe in front of potentially thousands of VMs but of which only handful will be operating at one time.
No, I mean we are just relying on the statistics of it. And of course if that happened, then we would have to bring up, we would have to migrate them to other servers. Which is something that we can do very quickly.
That's exactly right. But so far I think the most we have ever had on any one server, the ratio server to customers is such that I don't think we have ever seen more than maybe thirty five running images on any one server, and that is certainly within what we can handle.
exactly. One of the things that is perpetually fascinating is to see how many things that were in Smalltalk, twenty, twenty five years ago, are now becoming mainstream and accepted. Obviously just in time compilers, byte code compilation, and I am not to say that Smalltalk invented these things but they were things that Smalltalk used quite a while ago and Virtual Machines and Virtual Images. The virtualization on the server is something that is very common and we have virtualization on the server. It's just that the machines that we're virtualizing are Smalltalk machines, they are not Linux machines.
Yes, there are many Smalltalk implementations and I think all of them at this point with the possible exception of Visual Age have a Squeak port, and have people maintaining that port. So Visual Works has a fairly actively community of people using Seaside on it, I have never used that, GNU Smalltalk has just announced maybe last week that they have a Seaside port, Dolphin has a Seaside port that I did a little bit of work on, that seams to get a certain amount of use, obviously Dolphin is a Windows only Smalltalk. There are fewer people writing server applications on that. And Gemstone for the last year has had a Seaside port and that's actually of all the other Smalltalk is the only one that I've ever done and work with is Gemstone's Seaside. And I think that's a very natural fit in that Gemstone is a Smalltalk without a UI, it's just a serve side platform, but it's very well suited to the server side and to have Seaside running on top of that and especially with their persistence engine makes a huge amount of sense.
That's sort of the first level of persistence for us is just that we checkpoint the image. Because that's something that we can do extremely quickly and so extremely frequently. We do also have background processes that are extracting the data from the images and storing it in another forms that are more compact, don't have all of the baggage that a Smalltalk image carries with it. We can use these to backup we can also use these if we need to deploy a new version of the code, we can build a clean image that doesn't have any data in it, push that out every one and then incorporate their data into it to build them a new environment and that all sort of happens automatically.
Well it's still a binary format, still a binary representation of objects but it's one that is not entangled with all of the kind of Smalltalk process machinery and all of the code. I mean in the Smalltalk image there is a full copy of all of the code of the application, which has some interesting properties, it means that every customer every one of these thousands and thousands of images has potentially, we try for this not to be true, but potentially a different version of the code and so if we were having a problem with one database in particular we could just go in and put breakpoints or test code, or experimental modifications on their version of the code and only their version of the code, which has been in some cases extremely useful.
Right. So we have this version of the data that is separate from the image. And so what we do is just push out a new version of the image to everybody and then combine that with their data and then got a new image. But I mean different people do different things. So Monticello which is a version control system for Smalltalk.
Yes I developed it along with Colin Putney and it's the most common one used in Squeak right now. And it is able to take a running Smalltalk image and update it to a new version of the code which is to say it might have to remove a bunch of methods that are in the image as well as add methods and the Smalltalk image has to know how to handle, if you added instance variables for existing objects, how to migrate existing images from one to the other. It's a very different problem to push out a new version of the code when you've got these constantly running Virtual Machines than it is when you kind of kill all processed and start them up again and load the code in from scratch.
Yes, the naïve thing that happens in Squeak is that if you add any new instance variables all existing instances will have them as nil. And so what you tend to do is you tend to use a lot of lazy initialization approaches. If you are adding a new instance variable you'll make sure that its accessor has lazy initialization so that if it finds it nil, it can do the right thing. We also have for Dabble DB a series of migrations that are similar to what you would have for database migrations. When you, for the first time, are using a version of the code it will check to see if there are any migrations that need to be applied. And if so it will apply them and sort of go through all of the existing instances and make whatever modifications need for them to work with new code.
That's right you can iterate through all of the instances.
Well the method in Smalltalk is allInstances. So you can send all instance methods to a class and you can get all of them.
Yes, sorry. So Dabble DB is a tool for collaboratively managing data on the web. And the market really is people that probably right now are using spread sheets, emailing around spreadsheets to manage whatever the really specific data is for their organization. So this isn't for sort of generic project management or things that might be kind of vertical application for, it's for I am managing a symphony orchestra, and I need to have a database of who plays what instrument and which concerts they are going to be playing and who the donors are to the symphony and are there any relationships between the donors and the musicians that I need to track and these end up being things that are extremely specific to this organization. But the people especially somewhere like an orchestra or small business they don't have a lot of IT support, they probably don't have a lot of money to maintain a traditional or Access database or file maker database or something like that. Certainly they don't have anyone who is going to build them a custom web application for it. And so this Dabble DB is a tool that lets them collaboratively online build a mini application that has their data model and they can do a lot of the things that they might expect from a custom web application. Like they can put forms on their website they feed stuff into the database, they can get reports out of it, they can get visualizations like maps or charts out if it but without having to know how to write any code, and without forcing them to make any kind of upfront decision like you might for a database that you are going to have trouble when they inevitably realize later that they need to extend the system or change the way the system works. And so we worked a lot on having a real time exploratory interface to the data and on having very flexible migrations if you need to change the structure of the data model to support that. And also having a much deeper notion of data types than most databases do and so for example we have a location data type, where if you type in an address it knows that "Oh, this is an address in New York state”. And so we find grouping by country it will come up in the US, we have grouping by continent it will come up in North America, if you group them by state it will come up in New York, whatever I can show you a map that rolls up all of your sales by state by state or whatever. All just for putting in an address which is a data type that most people have but traditionally it would probably just stand up with a text field in the database.
Yes, our design rule is it should not be possible to have a syntax error. And so there are formulas in that if you have multiple columns of data you can say "I want to create a new column, that is times this one”. Or if you have for example a value that is a date range you can extract the new column that is the duration of that date range and build up things like that in the same way similar to what you might do in a spreadsheet, but it's all done in an interactive kind of a point click style rather than by being done by typing something in.
Yes, it's my focus right now, absolutely. It's what the company is doing.
Well I think people get put off by the way Squeak looks and reasonably so. It is a real barrier to entry that when you bring up a Squeak image it looks completely different from what you are expecting and probably looks it was geared towards I mean a lot of the design choices were aimed towards children rather than towards professional software developers. On the other hand the technology is extremely solid. The core VM has a great garbage collector has a very solid implementation, the I/O support isn't as good as say the JVM I mean the filesystem support is not as good, the socket support probably isn't as performing but it's certainly good enough and really we've had no problems with it as a platform. If we had we would have just moved to a commercial Smalltalk but so far, and I mean if there were any problems we would have seen them by now.
Well Squeak has green threads, which I think for a web application there's no reason to use anything else. Which is to say that we have at any one time as I said, twenty or thirty VMs running at once. That makes ample use of however many processors the machine might have. To have real threads, native threads within one of those VMs would be a waste. I mean there is no point. And the flipside is that because all the threads are within Smalltalk, they are very light weight, which makes you never worry about thread pooling, because they are extremely cheap to create, you can have designs that necessary that spin off two thousand threads and it just works. I mean this is a lot of the same stuff that people are discovering with Erlang, is that having light weight processes can actually be very valuable. And as long as your architecture is such that you do have a few native processes, so that you can take full advantage of the multi CPU architecture, I think there is nothing wrong with that.
No, all of the I/O is non blocking. And the VM takes care of that I mean it looks to your Smalltalk process like it's blocking but other Smalltalk processes run just fine at the same time. And realistically any one of our VMs is probably rarely running even multiple lightweight Smalltalk processes. In general there is probably only a couple of users at a time using any one database each database has its own image and its own VM, so the chances of there being concurrent HTTP request inside one VM are fairly low. It's not something we enforce, I mean if there are there are. But it tends not to be. So, it's really just kind of a moot point. I do find that for whatever reason and I haven't even really looked into this, that you get better throughput if you have your requests spread over a number of Squeak VMs. Having twenty concurrent requests being serviced by twenty Squeak VMs on the same machine performs better than having twenty concurrent requests serviced by let's say four Squeak VMs on the same machine. Even though in theory all the four Squeak VMs ought to be able to exercise the four CPUs or whatever the machine has. The concurrency in the Linux kernel works better than the concurrency between Squeak VM but that's fine, I mean we just know that and take advantage of it.
Yes, so I should say that what we expect, the profile of customers that we expect is a reasonably small team with a reasonably small data set. So that it makes sense. The number of people concurrently accessing the database is something that can be handled by one Squeak VM, the dataset is something that can all fit into memory, into the Squeak image. And the thing is that there are obviously millions of people for whom that's true. I mean there are millions of people with these small data management problems, where they have under a hundred megs of data, under twenty people that need to access this data, and those are really the people that we are targeting which isn't to say that we don't support people with larger data sets or larger teams, but it's not the majority of our customers.
There is a 64 bit version, we use the32 bit version of Squeak and so we have a sort of a hard upper limit we can't have an image that gets bigger that four giga.
I believe it's four gigs to be honest with you never come close. The largest images we see are hundreds of megs not giga bytes.
Yes so Gemstone is a Smalltalk implementation that is designed rather than having the entire virtual image being in memory, it's designed to have a virtual virtual image, it's designed to have sort of an infinitely large virtual image that is shared between many different VMs running at the same time, each of which sort of have only their current working set loaded into memory. And so if I am deploying a web application, then I would have however many VMs then again probably twenty or thirty VMs running on a machine, but rather than them mmap-ing the entire image into memory with all the objects, they would send requests for objects to basically a database server, as they needed them. And so if one VM is working on one customer's data it will sort of be bringing in those objects, or even a part of the data bringing in those objects, and kind of lazily bringing them in as they get accessed. So if I have one object that refers to another one, it's only needs telling that you need to traverse that reference that it would go and fetch it. What this means is that you can have sort of an object space that is Tera bytes big and you can spread the load to accessing that object space over however many processes or even machines that you need because you can have many VMs that are all accessing that same shared set of objects. And so that would be the obvious thing, they have done a lot of work recently on supporting Seaside and on supporting Monticello, which is the version control system I mentioned. And generally sort of supporting a compatibility layer, so loading Squeak code into it is very simple. So it would be totally reasonable for us I think to port Dabble to Gemstone, should the need arise to support a very large customer. That's not something that we necessarily have any plans to do right now, because that is not what our customer base is. But if there were a customer out there who came to us with some real need for a large dataset then it makes sense, then we can do that.
I'm not sure exactly how old Gemstone is but it has also has been around for quite a long time and I mean there is certainly the architecture difference that I mentioned, of having many VMs running but all kind of transactionally sharing the same large set of objects, so that's quite different. Squeak expects all of its objects to be in memory. Squeak I should say like almost any language you are used expects all of its objects to be in memory. Gemstone can swap them in as it needs.
Yes, but I think it's a little bit dangerous just to think of it as an object oriented database because it is also a dynamic language VM and the two are very tightly coupled. So I think it's almost better to think of it as a VM that has persistence baked in right from the start. There are object databases out there that are separate from the VM and where you have some kind of a client interface to them. And that ends up in a little bit of a different situation than Gemstone is. Gemstone also is a native code complier and so performs better than Squeak does. And Gemstone is 64 bit and I think it's mostly used these days in 64 bit . Squeak is 32 bit. I mean the obvious difference is that Gemstone is a commercial product and Squeak is open source.
It seems to me, I don't have a huge amount of experience with this, but it seems to be pretty easy at this point to commit code from Squeak and load it into Gemstone as long as you are not using, obviously there are some libraries there are available for Squeak and not for Gemstone, and if you were using those it might be a little bit harder. But I mean I think one of the nice things about the Smalltalk world is that like the Java world and somewhat unlike the Ruby world, there is a bit of an obsession with keeping everything sort of in pure Smalltalk. And if you need a pdf generation library or an XML parser rather than linking in some C library you would just build it in Smalltalk, rewrite it in Smalltalk.
Yes I mean that is part of it. And right, Squeak goes to this extent, I mean the compiler is also written in Smalltalk, the development environment is written in Smalltalk, everything in a Smalltalk image typically is written in Smalltalk. And what this means is that as long as Gemstone or any Smalltalk can understand the kind of trivial stuff like the format that you commit Squeak source code in along as it loads that in, as long as it understands the syntactic peculiarity so Squeak sometimes uses underscore for assignments instead of colon equals as long as the parser can deal with that then you just have to bootstrap how many libraries you need because it's Smalltalk all the way down. I don't think it would be a massive engineering effort to get Dabble going on Gemstone I think it would be reasonably straight forward, I mean it would take a little bit of time.
No, I don't believe so, I know of one other startup that does use something quite similar but that's because they talked to us about it. And I think probably it's more common for people to use for example a relational database for storage, or kind of an external object database, like OmniBase one that is available for Smalltalk, there is GOODS which is kind of language agnostic object database that I wrote a client for Smalltalk a while ago. And probably just to have four or five Squeak VMs running on a server, multi threading, something that looks a little more like a traditional, more like a Ruby deployment, I think that is more common. But we have a very particular problem with very particular constraints and possibilities because of how partitioned our dataset is and that's really the thing because of the nature of our business we have thousands upon thousands of separate small datasets, and that let us have a very different architecture than someone doing like a social network which has one massive totally interconnected dataset.
44. Ok so now on to the idea of scaling – you said you have 99.9 % of customers fit into that... have low interaction, but when you get that one customer who wants to hit one VM a lot, how do you scale that one VM.
I mean the one answer is that one customer is not someone that we want, right? And it certainly would be a reasonable business decision simply to say these are the customers that we can support and these are the one percent of the customers that we are not going to support. In practice there haven't been any problems like that.
Yes, exactly. We haven't had to make that decision yet and I hope that if that did happen that we would find ways to accommodate them rather than having to tell them to go away but part of being a business is figuring out who your market is and who isn't. And if we have to do that we have to do that.