Wilson Bilkovich Discusses Rubinius

1. We are here at Ruby Conf 2007. I am sitting here with Wilson Bilkovich. How about we start with you introducing yourself and tell us a bit about what you do?

All right. I am Wilson Bilkovich as you just said. I am a freelance Ruby developer and I have been exclusively using Ruby to get paid since some time in 2005. I have done various freelance Ruby and Ruby on Rails projects since then etc etc. and more recently I have gotten involved in the Rubinius project and the Ruby Hit Squad and similar things. My open source involvement has expanded quite a bit last year. In parallel I am still handling Rails projects of various levels of catastrophe.

2. Let's start with Rubinius. How did you get involved with the Rubinius project?

I was here at the last RubyConf and I saw Evan's talk, and it was extremely blue sky particularly in comparison of this year's version of the talk, and there was a particular slide that really struck me. And that was a slide where he had a code sample from his prototype and he had CPU=CPU.new. I saw that and I said "Wow, you can have a lot more lines like that. You could have 10 times do CPU.new". And now you have 10 CPUs. And for several years in advance of that I have been worried heavily about parallelism and multi-core CPUs and all that stuff. And I was starting to despair that Ruby was ever going to be able to handle that sort of problem. And I saw that slide and I said: "Hey that looks like the right way to do it". If you can hide as much as possible from the language then you can just decide that you can make up CPUs on the fly and it is going to happen.

3. Can I just interject: what's a CPU? And what does a CPU.new object mean in Rubinius is that like one core, a thread, a process?

That's an interesting question. Being a virtual machine everything is virtual so you are making a virtual CPU. And since there is this whole platform underneath you can decide what you want to do with that. You can map one to an operating system thread or you can map one and three of its friends to one operating system thread and schedule it. You can map all of them to one; you can do whatever you want. And depending on the hardware or depending on the work load you might make different decisions about using native threads or green threads or whatever. There is a whole universe of decisions you can make all the way from the way Erlang does it to the way Ruby 1.8 does it, to the way Java does it and they each go for different things. If there was one that was perfect for everything people would have just picked it.

4. So, what's the current state of Rubinius and threading? What's your approach? Which of these approaches do you have, an Erlang approach? I saw something about the actor library by MenTaLguY I think that is his name.

I confess I actually don't know his real name, I think some people do but I am not privy to that secret. I am assuming he is a head of state somewhere, something; it's got to be a secret. He's like _why ; he has to have his reasons I just don't happen to know them. So, relatively early on from that basic "What's a CPU? What is a task? How do you split these things up?" those are all relatively hard questions. Not tricky, but those are questions that computer science papers have been debating for a long time. And MenTaLguY apparently has read all those papers, because we had a discussion, we decided that the right thing to do was implement the most primitive possible thing, which would be a scheduler that schedules tasks, tasks that implement those tasks, and channels which are IO conversations about stuff.

And those are not really particularly concrete things; those are just really basic constructions. None of those is a thread in Ruby, none of those is a file, none of those is really anything, it's just the most abstract simplified thing. And it turns out there were those primitive things you can build pretty much any one of those concurrency models that you want. And Mental Guy did a bunch of those. He built actors, which I confess to know little bit about, I witnessed this several hour long pi calculus explanation that in the end it actually kind of made sense to me which was worrisome, but there are number of different ways to think about it and some people like being able to just pass messages between two things and being able to reason about how those work in a sort of mathematical way.

And I think we haven't actually talked about this in the last several months, but for a while we were big one what's called "Software Transactional Memory" which is in Ruby syntax and actually the syntax from the paper it was introduced in really looks like Ruby for some reason, I am not totally sure why, but you say "Atomic do some code" and then anything that took place inside that, it's like a database transaction, it only sees its own copy of the universe, and what you are doing behind the scenes is keeping track of what changes it made, and then if you decide at the end that it didn't raise an exception, you go ahead and do it. And obviously there is some overhead involvement in that, but the advantage, supposedly, is that you can combine multiple libraries that are using that themselves. One of the classic nasty concurrency problem is you write in a library and you use locks and all kinds of cool stuff and then someone else does the same thing, and they want to call your library from theirs.

And you get some horrible dead lock or race condition or basically the whole thing falls over. Every time you add a library to your project, particularly in C projects, you have to re-audit all of the code and make sure it is going to play friendly together, and that's a nightmare. So software transactional memory is seen as one possible cool way to deal with that, and my understanding is that Intel is looking at adding hardware acceleration for that in CPUs in the future. So make it cheap to do that, right now it's kind of expensive. But we looked at that and we said "Huh, that's cool but really we just need the most primitive thing here and that would just be a library". You can require STM or you can't right now... but if someone wrote that you could easily do that, and you could require actors and require whatever, and instead of only having one real primitive to work with in Ruby you are going to be able to build around from scratch, which hopefully is a good thing.

5. Are you planning to do multiple implementations, could there be exchanged at runtime or at compiler time or what's the plan, what's the situation now or for 1.0?

The idea is that, some of these things like the scheduler at point are written in C, it is just fantastically hard to write them in Ruby right now. And I like to think that we are eventually going to get pieces that are hard to conceive of of writing in Ruby into the Ruby stuff at some point, not for 1.0. But, right now threads are implemented on top of those channels, and scheduler and tasks. And that thread library, the threads are written in Ruby. The VM doesn't care about threads it doesn't know what those are; it just knows about tasks, knows how to schedule them. This code currently doesn't work in Rubinius but probably will by 1.1ish be able to say: native thread.new and inside that make some other regular threads. And that will be the classic m to n native thread map to green thread thing. And we hope to make that totally painless to use in client code. But to my knowledge 1.0 is only going to ship with green thread support, rather than exposing native thread and I believe that's mostly just due to thinking about how the API is going to look. Because this is something that to my knowledge and all the other implementations have done is exposing both kinds of threads. We want to make sure that we do that in a way that's reusable for other people if they decide they want to implement them this way etc. We don't want it to be Rubinius specific code unless it absolutely has to be that way. If that is something that no one else is going to be able to support then that's fine. But we don't want to make that judgment; they may end up implementing it.

6. Going back to your involvement with Rubinius what was the first thing you contributed? What was the first you committed, the first thing you looked at in Rubinius at home after Ruby Conf?

I think right after RubyConf I emailed Evan a number of times until he finally put his subversion repository online and then I think the very first thing I contributed was actually some sort of Makefile change that made it actually build properly on my computer. And I am not a big fan of make files and I am not good at them so luckily it was not a foreshadowing of what I will be contributing. But at that point there is really no documentation and I won't say there's a ton of it but things change too quickly to really get bugged down and cool looking diagrams (I like diagrams). At that point it was just me asking questions of Evan saying "How does this work? Is this the thing I am supposed to be looking at?" and he would say "No, That's from the old prototype that code doesn't even get run anymore" and I would delete it. I did a lot of SVN removes in a row getting rid of clutter that only new person to the project would have run into because Evan by then had long since stopped looking at those directories. I suppose after that I involved myself in the tests this was back when we had test units style tests instead of mini rspec and other kinds of behavior driven styles of speccing the whole thing out. And then I guess I just chose the hardest looking thing I could find which was the compiler. And beat on that until I understood it.

7. How was the first compiler implemented? What were you using? Were you using Ruby or C code?

The very first compiler was written in Ruby there has always been a Ruby compiler. At that time the compiler only run under MRI, you actually had to have Ruby 1.8 installed already at that point. Official Ruby would compile your code for you and then Rubinius would run it. At that point we were not fully self hosted. You couldn't install Rubinius on a computer that didn't have Ruby already which was lame. But to speed up the whole process, Rubinius has borrowed and forked I guess the C base grammar that Ruby 1.8 has. Ruby's grammar is notoriously complex, I happen to think it is beautiful but the grammar file that implements it is not beautiful. That was taken and massaged a little bit to make it a little less Ruby specific.

To separate out the parsing from the runtime piece. There are now two files that are more separate, in Rubinius then I believe they are in 1.8. What you got out at that time was essentially identical to what you would see using the parse tree gem in Ruby 1.8. And Evan had already made some changes, he had worked on a project called Sydney that had various additional features added to it so at that point we actually had keyword arguments in the grammar, because he had seen Matz talk at one RubyConf and it cool cool so we added it. Actually it was really easy. Those have since gone by the way side because want to focus on 1.8 compatibility and we don't really care about those quite yet. We borrowed the C grammar, and that spread out a parse tree that looked essentially identical to the Ruby 1.8 parse tree. Ruby 1.8's grammar emits relatively strange things sometimes. An example I think is good is the case statement.

Ruby has two kinds of case statements: it has case foo when something when etc. and various presentations at this conference have shown that you can use it to select on what kind of object you are looking at or you can use regular expressions. It is really powerful. The other kind is case with nothing after it. You say case and then when x=5 when x=6. And those are really completely different. They don't perform the same task they don't really even compare things the same way. And it's just a coincidence that they are called the same thing in my opinion. We didn't really want to have to write a single method that dealt with both of those problems at once. That was going to be a hassle. At that time we had a step called the normalizer, and it would take the input from the grammar and manipulate it in various ways. And one of those ways is to take case statements that don't have anything after them and rename them to something called "Mini if". We thought that was a cool name. And it is completely different, just so happens they both use the word case, but it was painful to figure out which one you were actually looking at in the compiler. The original compiler was 5 passes, which was a lot.

It was done that way because it wasn't clear at the time what was going to be painful. I don't think anyone had written a Ruby compiler in Ruby before and it wasn't clear what was hard and what was easy. Because actually some of the most complex looking constructions in Ruby are actually totally painless to implement, because they don't have funky edge cases and they don't need to store a bunch of information for later passes it's just get it done. Some of the things that I thought were going to be easy were nightmarishly hard.

8. Do you have an example for that?

Yes, an example for that is this unbelievably hideous test case that I think Ola from J Ruby contributed or at least contributed to the idea for. It is a method where you have default values for several of the arguments. So def example X=5, Y={} Those were all fine. Well, he came up with an example where one of the default values was a lambda that referred to one of the other arguments in the list. One of the defaults was: lambda {|n| n*5}. That was so hard, I will never forget how hard that was, it was just agonizing. We had basically reworked the entire way that method arguments were processed because we were not yet laying out a local variable for that so that it didn't have any idea what n was. That's a scenario I wouldn't have even thought of, these are incredible creepy things. Ruby is defined mostly by what you can do not what you can't. And that's cool we like that, it lets you do all sorts of fancy domain specific languages and cool tricks and it is beautiful. But it leaves some funky edge cases, an example of that that turned out to be fairly easy to implement, though I haven't any real idea why, is a method that has a method definition as the default value of one of its arguments. The first time you call it, it does one thing and the second time it has redefined itself because it evaluated the default argument and replaced itself. I'm sure Matz never thought of that idea, and it's never been hopefully used in a real production code and if it ever is I will have to hunt those people down. But there is nothing in Ruby that says you can't do that, so it just works. And that is interesting, that's one of the things that I like most about Ruby, but sometimes when you have to implement it it's a little painful.

9. Could you maybe explain some of the passes that the compiler goes through?

That's true, I sort of lost my train of thought there

10. Maybe you could explain the format that you get the parse tree in is the Ruby parse tree library by Ryan Davis.

The format that we receive is essentially an array of symbols. And some of the entries in that array might be other arrays that are arrays of symbols, so the word tree in that library name is correct. It's really a tree, it has an outermost piece and then branches following on down, and that's basically how Ruby 1.8 thinks about the problem. And it's a fairly classical way to handle languages, and in this case you get arrays with symbols and other languages have other ways of handling it. You might have a node called If and one called Case and then you have specialized ones you never see. For example ||= everyone calls it "Or equals" or whatever it's called ampersand or and you get a node called that and you have all kinds of fun things you have to deal with there. Arguably you get handed that by the parser.

People would probably complain that I'm conflating a couple steps there, but basically you have got - at least in Rubinius-think - you've got the C stuff that spits out grammar and a parse tree and then you have everything else. Eventually we'll have a different grammar and it will be more and more integrated, but the way I think of it personally and hope this is not crippling, is that there is this C box that spits out arrays of symbols and then you do stuff with it. The 5 passes I don't know if I am able to remember these on the fly, but the 5 passes we used to have were the normalizer, which took the parse tree straight from that and then manipulated it to make it a little cleaner. We had local scoping, Ruby is full of complex local evaluation stuff and an example of that is what happens when you create a new local variable inside a block or what happens when you have a block inside a block. You need to carefully manage which n we are looking at right now.

The worst case scenario would be you have an argument to the method and then you got locals inside the method body and the you have blocks that have block arguments that take the same name, and etc etc. You have to be very careful about that. Surprising amount of code actually cares, Rails in particular is full of this kind of stuff. You need to be careful about that and we decided we need a whole pass, we look at the whole thing and completely resolve all the locals. And it is done that way or it was done that way because one of the initial goals of having a compiler is to be able to optimize out local variables you don't need. And the JRuby talks spoke about that briefly. It turns out there are a lot of locals in Ruby. More than just the ones you type in the method. If you are doing class eval in a class body you are surrounded by things that Ruby 1.8 now has to keep in memory because it is not sure when you are going to use them. We decided to be as aggressive as possible stripping out locals we don't need.

So if you don't use a local variable inside a block or inside that method then we just throw it out. And we threw out all the names and we threw out everything like that and we just stored numbers and so when you are actually running the code it says "Ok, I'm going to get this local it is the third one". Bam. And in order to do that you have to resolve all the locals there is a whole parse for that. And then there is a sort of translation phase. The original compiler was a little too generic, again this is perhaps a concise way to put it, it was build to be able to have arbitrarily many passes, that you could create new processors and stack them up and they would be called in order. That's a cool idea, and it turned out to be fairly complex. And the reason it was complex, is that for example to take the normalizer, let's say it manipulates something and it adds an extra symbol to the end. And it does that to pass some information along.

Now you can't just look at the parse tree and know what the compiler is actually going to see later on. You have to remember that "Oh, we have already passed through this earlier stage and we manipulated it so now we are expecting something other than what we would get by just asking the parse tree gem to tell us what that is". And that as time went on got more and more of a hassle. And we ended up needed to add all sorts of things, just plain old Ruby objects to the array we were passing around and store the information we needed. Another pass was called "Local State", now we know what our local variables are, and where they are used and now we need all sorts of information about whether this has blocks and that did way too much. There were so many things that I'm not even trying to explain it, but that was where we stored the context that the compiler needed, and this is a fair amount of that in Ruby.

The next step after that is the thing that actually takes the expression you have been handed through all the stuff that hopefully it is still mostly an array of symbols. And it spits out Rubinius CPU instructions. This is what I mean by Rubinius not actually knowing how to execute Ruby code. The virtual machine doesn't know anything about Ruby and it doesn't care. And the reason it doesn't care is that the compiler emits instructions like check r count, and send method and all kinds of stuff. So it takes this array of symbols and figures out what you actually wanted to do and spits out the instructions that are needed to do that.

11. Do you know which VM architecture the VM is based on? Is that Smalltalk, the byte code or the instructions are Smalltalk like?

It's sort of its own thing now. It certainly started its life based on Smalltalk blue book. And after the last RubyConf I went and bought that book and read it and it's fascinating, if you could find a copy you should absolutely read it.

12. It's available online partly.

It's actually cheaper than what I expected, I expected it to be rare but it was around 30 dollars which is computer text book land is essentially free. I just bought a efficient polymorphic calls book for an amount I will not disclose because it was hateful, anyway cool book.

13. Can you give us the title? Do you remember it?

Yes, "Efficient Polymorphic Calls" by someone who I am shaming by not remembering the name (Karel Driesen). But it's basically a whole treatise on making dynamic languages fast. And sadly it's horrendously expensive, I guess it is a small publisher or they are just making text books expensive. It is probably a text book they might us in MIT or something and they brutalized everyone. So started life as a Smalltalk inspired design but turns out Smalltalk is actually fairly different from Ruby and Smalltalk people like Ruby and Ruby people tend to like Smalltalk and as an aside one of my favorite books was one that was mentioned at this conference "Smalltalk Best Practice Patterns" by Kent Beck And I didn't know any Smalltalk when I first read that book and it didn't matter because it looks like Ruby basically.

That is a beautiful book that thought me how to write code that I didn't want to set fire to a week later. So it started life as that, but it turns out you actually need a different set of instructions, to implement Ruby due to its dynamism even ignoring eval it's just a truly ludicrous number of tricks you can play in Ruby and you need a lot of information you need to be able to do really flexible things, define method is a fancy trick that has its own set of needs. I believe that the set of instructions that Rubinius uses is starting to fade on the memory of the blue book but it is fairly different. The Smalltalk blue book has a concept called primitives which are not CPU instructions but are methods that are implemented inside the virtual machine at a level much lower than a regular method. And you do it when you need something to be efficient or you needed to talk to the platform in a way that you feel ready to expose or whatever. There are various reasons to do it. And I know our list of primitives is completely different from the set that is in the blue book at this point.

14. So primitives are a kind of API of the VM? Is that correct?

That's a good way of thinking at it. They look like methods, but you have written them in C so I guess they are old regular Ruby 1.8 core methods at that level. In a sense everything in Ruby 1.8 is a primitive.

15. The Ruby base implementation of the standard libs like the access file system call , they call these primitives. Is that it or both?

We try not to, we try to keep this set as small as possible because clearly we like writing Ruby more than we like writing C code. I am trying to think of a good example that still exists: creating a block is something that would be pretty tricky to get right in Ruby, you would have to have a special way of saying "Hey this code here is actually not meaning to create a block literally, it doesn't want a block it wants to make a block for someone else". Really there are times when you'd have to clutter the syntax of the whole thing quite a bit in order to strap it up to that level. And hopefully we would get to that we'll try to find a nice balance for that, there are fewer things written in C as times go on. But sometimes it's 6 lines of C code where it would have been 14 lines of Ruby. It can actually go the other way which is an interesting thing I have noticed in Rubinius. Another time I was writing C code and I realized it was valid Ruby. So that can get strange as well. The primitives are sort of the API of the virtual machine that makes sense. One way of thinking of it is that they are: if CPU instructions are the language the VM speaks, primitives are a layer you put between a language that is going to run on that VM and the VM. Perhaps if you wanted to make a Java front end that run under Rubinius VM you might end up with different primitives. I think ours are generic enough that you would probably be able to reuse them but that is sort of a shim layer that you need to put in place sometimes to get things done in an efficient way. As times has gone on we have moved some things out that layer and moved other things back into it etc. Hopefully we have a plausible balance of that right now.

16. You mentioned that there was the old compiler that you were just talking about. This is the new compiler. Can you give us a quick overview for this?

Sure, I wouldn't call them mistakes but various lessons were learnt in the other compiler and it was very slow, which given that we already had a compiler not an interpreter than everything gets compiled, and if hasn't already been compiled, the time the compiler takes is part of your runtime. It's part of what you see when you type in random Ruby code at the console and hit enter. We need to be fast and we have explored various options like using something other than arrays to represent the parse tree, making aSEXP S expression class that could get used. And the reason we thought about that is that the array square bracket methods in Ruby do all kinds of things. They take 2 numbers, one number, they take a range, all kinds of things that I probably can't recall at the top of my head. When you say sum array square bracket zero square bracket, to get the first thing in the array that method has to figure out what actually you wanted first. It takes all kinds of different things and it says "Did I get a range? No. Did I get a fix num? Yes.

Did I get another one? No. All right, you just want one thing". And it does that every time. And in Ruby 1.8 it's all done in C and all that checking is fairly inexpensive. I have personally have it profiled what expense in 1.8 doing that I expect it'ssignificantly actually but not having done it all I will pretend it is free. Rubinius array square bracket is written in Ruby and so the overhead of checking all those things is much higher and I have a feeling array square bracket is going to be our greatest battle make that thing perform. And we may end up moving that into C at some point just to get moving forward. But it made sense to us that since we had total control over the code the compiler was written in why not use something that need to do that checking. Why not use the most primitive possible thing that it would work? That sounded like a cool idea and we were looking at ropes which are a cool flavor of string that are optimized for very different things.

And we said "Ok we could use ropes instead of strings, and we could use this instead of that etc". And then we were like "Hmmm or we could just have way fewer passes". The current compiler as of yesterday, not exactly old, has two passes: it has all of the front end stuff previously mentioned all the normalizing and handling local variables and all that stuff, and then it has the spit out the byte code step. Bam. That sounds like it would be collapsing a bunch of work into fewer files and it would be more cluttered, along with that it has a whole new design of how to think about the problem. The old compiler had one method / node. It had a method for handling case and it had a method for handling old kinds of crazy things like: dynamic regular expressions and all sorts of cool Ruby tricks. And sometimes those would be so complex that they would need to hand off some work to some other helper methods.

But generally it was the idea that you would have a method named after a thing and it would handle it. And some of those have gotten wildly complicated, in particular the splat multi assign thing is truly deep and it is full of things that I am not going to reveal because someone might use them in actual code and then I would have to support them. It's full of tricks that no one should use, keep it simple. But that had got pretty hard to deal with and we got into a point where contributors would routinely try to fix that as their first thing. They would say "Oh the return value of this is different than in the MRI I am going to fix that". And we already tried and failed to do that, or tried and felt it was too expensive to do that.

Sometimes we had good comments saying "Don't touch this or don't attempt to make this work the way Ruby 1.8 works". And sometimes we would forget that comment and people would spend several hours of their valuable time working on that. We decided we needed a much cleaner way to represent all that. And so the new compiler has one class per node. There is a class called rescue and there is a class called case, they wrap up all the stuff you need. And that sounds like it should be equivalent but what it gets you is there is a single place in that code that actually generates the result. It actually spits out instructions instead of adding them to the stream all the time as you go by and then having to sometimes remember what we did and say "Oh, we are actually inside a rescue clause and we break we want to do something completely different then would do inside the body of a block". So let's use zero and immediately on "Yep that's the part of the code that actually says save". And as a side effect that made running specs vastly easier.

The old compiler specs were useful, write a spec first and then change code till it passes that's great, that's how I write code. However you ended up needed to write a spec that said "This thing should equal this string full of CPU instructions". And some of those sets are quite long and you have to find the piece right in the middle of it that you need to edit to make the change you were going to make. It was getting irritating. So now it is just the block that says "Hi, I'm a pretend compiler and here are the commands I am expecting to get". And then did the real compiler actually do that to what it was given? Bam done. And so it reads like Ruby code now instead of like a string and it is going to be vastly easier to make quick changes to it. Given what I have seen of future Ruby code it looks like we are definitely making compiler changes and additions to the parser it looks like there's some fun new stuff.

17. Can you say, did you keep most of the old logic that compiled the byte code and mostly this was a structural change? Is that a way to put it or not?

Didn't really keep any of the code, though I am sure it was opened at all times when writing the new version. The logic is kept; the logic survived numerous battles with very complex test codes. The logic is quite good and Charles Nutter was saying that JRuby had the only complete compiler for Ruby, and I am sure he is correct; they probably have some things we don't do. But that list of things is shorter every day. In the next smallest amount of time we expect that this no longer be the case. But most things were hammered out and worked well. So the logic stayed but the structure is pretty different and really necessitated writing fresh code. But luckily the whole thing is already fairly short, so it's nor true rocket science, it is just tricky.

18. That was a lot of Rubinius stuff. I mean when you were not focused on Rubinius you also do a lot of other work. For instance you are part of the Ruby Hit Squad. Can you give a quick explanation what you do and what government you work for?

I can't disclose that, however Ruby Hit Squad is a group of people who have seen too many spy movies perhaps, and at the moment it is myself, Ryan Davis, Erik HodelShane Beckers sometimes goes by Shaners I don't understand why but he has his reasons I am sure. So far we have one project, we are hoping to have more, we've had various others cross the sights of the rifle. The first project was " Vlad the deployer", which I think was a great name, and unfortunately the gentlemen who suggested that slips out of my mind, I will feel horrible after this, that I didn't remember it. Anyway that was the original name suggested for Capistrano when it was changing names from Switch Tours due to licensing issue. And a bunch of cool things came up and my very favorite was "Vlad the deployer". And I was unhappy that there was nothing called that when it wasn't chosen.

I have made a lot of use of Capistrano in a number of projects and I have done semi horrifyingly complex things with it like deployments that were different per hosts they were being deployed on, like dynamically generated config files and all kinds of distributed files system mayhem. I liked it well enough but it's very complicated. I was feeling the pain of dealing with tasks like writing a deployment task that did something slightly different on each host, and the easiest way to do that was to write your own custom deployment code of various types and it got complicated very quickly and you ended up duplicating the work you have done. And that is fine Capistrano wasn't designed for that and I was abusing it to make it work that way.

At the same time, I don't recall the original issue, but Ryan Davis was screaming at me over IRC in instanct messager about how he was having all this trouble with Capistrano and I don't have any recollection at all of what his problem was. I suspect at this point we can probably fix it quickly. But the general complaint was that it doesn't actually use SSH, it re-implements SSH, which is handy because it works the same everywhere. But we like SSH, it works great. When I type SSH some host I get in I don't type my password everything happens the way I want it, I got my config file, it is all dealt with. We had these two opposing but compatible view points about what we didn't like. I said "How hard can this really be? Does it really have to be as many lines of code? I don't know, let's find out". And so we got to a neutral location which meant I was the only person who traveled anywhere but we went to Seattle and we watched some more spy movies and we set down and built it and it turned out better than I expected. It's remarkably small and it works remarkably well I am using it most of the time now and I am fairly happy with the results. I understand we have a fairly controversial website and various people have taken offense but I am sure none is intended, we just like pretending to be spies.

19. Is the development going on? Will you change it, fix it? Will you have another little meeting?

That would be fun, there is certainly plenty of good food in Seattle and I would be happy to eat it again. We have a new version coming fairly soon and at this point we are adding new contributed support for things like git and we have darcs support now and I think mercurial and very other cool things.

20. Distributed version systems?

Various people say I like Vlad but I use fill in the blank thing and a number of times those people have written the code because it is very simple and we are pleased with that. We will continue to do new releases, to package contributor code which is cool. We are planning Windows support at some point and I keep pushing for that, I actually don't use Windows very much myself but I recognize there is a real population of Rubyists and I want to be able to support those people. At the moment we rally only work inside cygwin which is a hack and real deal Windows developers prefer to use their own tools and don't use it. Hopefully we'll get there soon.

21. Before we wrap up I would also like to mention that you also work for the RubyGem project the package management. I think you were recently involved in a rewrite that should make many people happy, speed up the index performance. Could you quickly say what you did there?

Sure, I run into this actually working on Rubinius because Rubinius needs to support Ruby gems, so I mad a list of things that Ruby gems was doing and that Rubinius needed. And one of those was very heavy use of YAML and at the time we didn't have YAML support in Rubinius, now we do. I looked at the way Ruby gems used YAML and Evan in particular knows that it was using 120 megs of RAM every time you wanted to do something that used a gem index like install a gem. That seemed like a lot. We looked at that and it turned out the whole index of all gems was one giant YAML file which probably sounded like a really cool idea when they were 100 gems, but there are a lot of gems now and that file is big and uses over a hundred megs of RAM in Ruby 1.8 loading it. Uses vastly more in Rubinius because our YAML support is unfinished. But either way it's plenty. And I was like "Why? We have plenty of ways to store data in Ruby and YAML is nice but this is not a human editable file, this is a huge thing that would probably break your text editor" and it seemed like not the best format for that. So I asked Erik Hodel who is the manager of Ruby Gems now, what happened if I wrote a patch for this. He was like "Would you like commit rights?" Apparently it meant that he was happy with the idea and so I wrote that and exposed what I think are a number of marshelling bugs in Ruby. But along the way it ended up just dumping the objects directly out to disk. I was surprised of the amount of RAM even that uses and so it was down from a 120 to around 30 which is better. But arguably not perfect. With the next release the indexing process which runs on Rubyforge is updated spit out that file, and the client fetches it first and falls back into the YAML if it needs it etc. I was surprised by how readable and usable Ruby gems code is. So if people are thinking it's really not it has comprehensive tests and it is realty easy to work with.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Bio

About the conference

This content is in the Performance & Scalability topic

Related Topics:

Sponsored Content

Related Editorial

Related Sponsored Content

Popular across InfoQ