BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Chris Wanstrath on GitHub

Chris Wanstrath on GitHub

Bookmarks
   

1. I am here with Chris Wanstrath of GitHub who has been having a lot of success lately an want to talk to you a little bit about that success and where GitHub is going and the interesting stuff about it. So, first of all, for readers that might not be exactly familiar with what GiHub does just give us the 50000 foot view

We like to call it Social Coding because it's similar to the sites that have been around 10 years like SourceForge or Google Code even, where that's a place you go and you put up an open source project so you can share it with everyone else. GitHub is different, we think, because you are not just sharing it with random people who are googleing for whatever and then landing on your side incidentally. You are sharing it with your coworkers, you are sharing it with your friends, you are sharing it with people that you know either by reputation or because you've met them in person. And so GitHub is a lot more about the people behind the Open Source then just putting up Open Source code. So I think that's what makes it different and what makes it really fun for us and the people who are very into the site to use.

   

2. Are companies using it to host their source code like versus having their own Git installs internally?

Absolutely. There's tons of companies that use it to host their private commercial code on it.

   

3. You can tell that by what's public and what's private right?

Yes. There is a very strong line between the two. So you can host public Open source code or private commercial code on the same account if you want and we try to make it so there's never any confusion about which is which. So it's nice because you can have the same account that you are using during the week doing your work on and if you ever want to experiment something or play with some Open source that you might even pull into your company's code you can do that from the same place.

   

4. Is GitHub just the playground for Rubyists or have other language communities adopted it?

It used to be the playground for Rubyists and Ruby is still definitely the strongest community on the site. But we've seen a lot of growth lately in .NET which has been pretty cool for us, Python is the second biggest dynamic language that's on the site besides the Javascript. And then Javascript almost all the major libraries in Javascript use it for their code hosting. And then Perl we see a lot of that going on so it's pretty exciting for us because we came out of the Ruby community and that's what we knew best and this is giving us an opportunity to learn what other communities are doing and kind of like get ideas from them and it's pretty cool because everyone is on the same page and a lot of times there's solutions in other languages that we haven't even thought of. We see a Perl project pop up and we say: "That's a good idea, let's do that."

   

5. I've heard GitHub described or compared to the introduction of sexual reproduction in the evolutionary stream. Is that hyperbole or do you think that?

It's an interesting analogy. I mean the idea is just that with GitHub one of the things we always say, it sounds kind of funny when I say that loud, but it's one of our mantras lowering the barrier to contributing code and releasing code and what we think that's doing and people have written some blog posts suggesting this as well, is that when it's easier for you to publish Open source code or modify someone's else Open source code, the rate at which Open source code evolves just gets faster and that's a lot like the analogy you are talking about is with reproduction, at first single cells would just split, it was just all like asexual reproduction and then you got to the stage where they would just run into another cell and then some of their DNA would cross over the cell wall and then you get a higher rate of evolution and that's kind of what we are talking about is that it's a lot easier to take code from someone else to share your code and to kind of evolve a project with something like GitHub and these newer generation of code hosting sites then the classical ones where all you are doing there is publishing a project. So it's more about the code then contributing and publishing with GitHub and that's why some people think it's making it go faster.

   

6. Forking used to be a dirty word.

Forking used to be a dirty word. That is one of the things I am most proud of is that even in some communities a Fork is a bad thing and something we wanted to do was make it a good word. We have shirts that say "Fork You" and it kind of sounds like another phrase but the idea isn't kind of like "Fork you and get away from me", it's this phrase that at first looks negative is really inclusive and inviting and so for us and GitHub is all about forking where I see your code that you published either internally in our commercial business or as Open source I can fork it which is a branch, I can make my changes to it without asking for your permission and then if you like what I've done you can pull them in, if not I can keep going. And the truth is really that this has been happening for a long time.

Even with older tools like subversion you do that, you get an Open source piece of code, you modify it but you never shared that with the outside world and maybe you even forget what it was you did. You have a little patch that you add to your framework and then when you upgrade your framework there's a conflict or a regression and you don't even remember adding that patch and when it's easy to keep track of that patch that you made you are lot more likely to do anything I think and it kind of helps practicing.

   

7. It helps contribute to the overall growth of that community.

Absolutely. We see a lots of projects where just one person threw out some code and they are pretty good about watching what are people are doing and the project just explodes. The best example is "ClickToFlash" where it was posted on Google Code; it's a Safari plug-in that you block flash on a page, posted to Google code then Google code or the person who posted it just removed it. He said: "I deleted my project".

Someone who had a copy of the source code, which is Open source, posted on GitHub; he wasn't the original author of it and then people started forking it, adding features that person has been really diligent about merging in changes and kind of guarding the code and it just has this huge network some forks are doing their own thing and it's just thi growth that I don't know if it would have happened anywhere else. And so seeing stories like that pop up more and more is great because this whole model really helps Open source in that way.

   

8. You guys, and by guys I mean you and Tom and I guess other members of you team, are known for championing "Not Invented Here" as a good thing. And generally that's been a bad thing, so why do you guys have such innovation going on at GitHub compared to other places?

I think that the whole our going about and saying not Not Invented Here is a good thing. It's not about us trying to be like to oppose the mainstream. What message we want to get across is "think about what you are doing". Just because there is a phrase called: "Not invented here" that has an acronym that you can post in a reddit comment doesn't mean that that's a good thing in all circumstances, so what we are trying to do in our company is always think about everything.

We are the company of the compelling argument and so it doesn't matter if there is a phrase that has 1000 years of history behind it, if it doesn't make sense in the circumstance we are looking at. So we try and kind of advertise that by saying, sometimes Not Invented Here is a good thing and we did this for this reason and in all those cases it was because we thought about the situation, we looked at what was out there that would fit our needs, we drew up a list of requirements and if it met our needs we went with the system and if it didn't we wrote our own thing. I mean every system that we write that's note invented here there is a hundred other pieces of open source code that we are using somewhere else.

   

9. The latest example that is an RPC framework called BERT and Ernie. How did that come about and what is it?

That is mostly Tom Preston-Werner innovation. BERT is to Erlang as JSon is to Javascript, so it's a subset of the language that could be used to describe data structured in the language. And the beautiful thing about Erlang is because it's dynamic it has so many data structures that other languages have: Python, Ruby, Javascript. So you can express Ruby data structures in Erlang's binary term format pretty simply. And so what Tom has done has he's given it a name and he has written a specification for how to use BERT and what BERT is exactly. So this is already going on between Erlang processes.

   

10. So this is already production ready.

And then the libraries are BERT RPC and Ernie those are what we use on GitHub right now. The reason we specifically wanted to do something else or something different is because we handle lots of large binary data. Big commits, big diffs, big pieces of code we're constantly sending them back and forth, so for us we needed the kind of Edge case to be priority. Many systems like JSon, they are great for almost everything except large binary data, let's say, and so for us that was the main requirement. So that's why we didn't go with the JSon RPC system and why we had to evaluate all the systems from that point of view.

   

11. GitHub has occasionally been plagued by down time. You guys are a young company. What sort of challenges were you facing that caused the down time and those of us that depend on GitHub to make our livings? What kind of expectations do you have for the future?

I think what you shouldn't expect is that there will be no down time right now, that all the problems are solved. That is a big focus of ours is the site stability. We've been plagued by down time and also slowness, so we've had problems when even one it was up it was almost unusable slow.

   

12. You were a victim of your own success, right?

Yes. People say is a happy problem we have, but none of us are happy about it.

   

13. Is it further proof that Rails doesn't scale like along the lines of Twitter?

I wish, because I know that a lot of people are really good at Rails and if Rails is our problem I could just call them up am have them fix it. But no, we've had lots of other problems specifically dealing with Unix file systems, dealing with, like I am talking about storing large amounts of data accessing it really quickly from all these sorts of web servers that are all spread across different machines. Keeping it up to date, keeping it backed up, keeping it secure, keep making it fast, this is something that none of us had any sort of experience with and I don't think a lot of people do, especially with Git.

I mean we are definitely don't even need to look, we are the biggest Git hosting provider in the world so we are dealing with challenges with Git itself and Unix file system that no one else had done before and so it definitely was really hard for us for the first year and so especially when we started getting really big, kind of figure out how can we make this fast, how can we make it stable. So our first priority was stability definitely, but making it fast; that's what we wanted to do and that's what we focused on and then once we've made it fast we started talking to Anchor which is our support team that we use, about: "How do we make it stable? You guys are Unix pros; tell us what you think is the best way to keep machines up to do high availability, to do swap over, I mean what have you guys learned about dealing with big sites, how do we make this site never go down?"

And so we are not there yet but that is the goal: the goal is the site will never go down and what will happen instead is part of this site will die in isolation in the worst case scenario, but the idea that one of our file servers goes down and brings down the web site, that brings down something unrelated, that should never happen. So right now there is still a chance because our new system we are growing into, we're figuring all the pain points, we are getting all the high availability in place, but that's our goal, no down time for the site as a whole just isolated instances of failure.

   

14. GitHub is still relatively young, correct? What is the future of GitHub, what is the feature of Chris Wanstrath? Is it going to become a huge company or you guys think very small focused on what you are doing?

I wish I knew. We are definitely staying focused. We want to hire more people. We have 8 people full-time right now, but we want to grow the engineering team. We want to add more people; we want to add really good people. Luckily working on a coding site gives us access to really good people. We meet people really easily through the site and it gives us kind of a big available pool of hires which is sort of frustrating when we can't afford them all right now and we are not really big enough to really need them all. But I think we want to have a big company.

   

15. GitHub has a commercial model that seems to make sense. So I for instance from Hashrocket have given you guys quite a bit of money because we have maybe 100 projects on there. Are there a lot of other companies like us that are using GitHub in that way? Does that provide room for you to grow?

We have almost 10000 paying customers' accounts and we are growing really quickly and we are adding more and the rate of growth is increasing and what that lets us do is make good decisions. When we were able to make fund our website, fund the software, the hardware and pay people salaries without having to worry about what people with money want us to do. I think it's a really good thing; it's put us in a really good position because what we can do now is try and make good decisions based on what our customers want and so our future plans are always based around: "What can we do for Hashrocket, what can we do for the people working at Hashrocket, for the people coding there?" like what is right now, for instance, one of our primary focused is organizational support is getting better permissions in place, having one person being able to administer a whole group, being able remove people, add people easily and doing things that businesses want, make it easier for businesses because that's who our customers are. So we want to be constantly adding features for businesses in Open source users and we want to be doing that because they tell us that they need it and not because we took a big check from some big company. So for us in our rate of growth having grown as we would have as far as staff goes if we had taken a bunch of money...

   

16. Do you have any specific philosophies about hiring staff? Can you describe some of them?

I don't know. We are the company of the compelling argument, so to hire someone everyone in the team has to be ok with it and it's not just a situation where everyone in the team has a veto right, it's that everyone in the team has to think it's a great idea and that makes it a little bit harder for us to find people, I think but I think it makes the people that we find really valuable. Our 2 most recent hires where Ryan Tomayko and Kyle Neath and everyone was completely ecstatic about it, we were so happy and there was never a question in anyone's mind either them and that is kind of the way that we do it if everyone in the company is super excited about hiring someone than we're going to hire that person and what I want to do is have a company that maintains that, I don't know if that is really scalable as you get bigger, but for now where we're at it's really important that we prove ourselves as a company, I think that is the way we should do it.

Mar 31, 2010

BT