Interview: Dan Farino About MySpace’s Architecture
In this interview taken by InfoQ’s Ryan Slobojan, Dan Farino, Chief Systems Architect at MySpace, talks about the system architecture and the challenges faced when building a very large online community. Because MySpace is built almost entirely on the .NET Framework, Dan explains how a .NET product scales on hundreds of servers.
Watch: Dan Farino About MySpace’s Architecture (28 min.)
In the beginning of this interview, Dan speaks about general challenges encountered and the solutions used to make a very large web site run smoothly. He enters into details of dealing with performance bottlenecks, database or system failures, the need for debugging and logging applications. He mentions having problems with database bottlenecks which were addressed by implementing a custom cache.
Dan also talks about the .NET platform used to support the MySpace’s web site. He says that .NET has scaled well for them serving millions of users from hundreds of servers. One of the problems mentioned is the garbage collector which was intermittently introducing a significant delay in web site’s response time. Administering hundreds of servers is a difficult and time consuming task. Dan says they are using Microsoft’s PowerShell from the time when the technology was still in research, code name Monad, and he is very happy with it.
MySpace is comparable to Twitter in my mind - the site's really slow, doesn't really do much complicated actions, and has a tendency to break a lot.
So, listen to this and learn what not to do ;)
How do you test it?
listening to you (or similarly Dan Pritchet talking about eBay architecture) I have same questions in my mind:
how do you test new features before you roll them out? how do you test them scale? you don't have test lab with same level of scale as your production farm, do you? that makes me think you can't guarantee or knowingly predict exact level of performance until real users hit the new feature in real time. how do you deal with that and what did you have to build into your system to support new features deployment, as well as rolling features back quickly if apparently they did not work out as you had expected?
and another question is what did you have to do to support existence of "practical" development environments that behave as your production system but do not require each develop to work on dozens of servers, partitioned databases, and cache instances. How did this change your system's architecture?
Re: How do you test it?
you might be surprised how many companies don't have proper development/QA setups that replicate production environment well.
I witnessed number of scenarios when code passed developer and QA testing with flying colors but utterly failed in production environment. The failure was caused (as you already guessed it) by difference in development/QA vs production environment setups. Developers were writing code for one environment while being completely oblivious (no fault of their own) to the production environment pains and hazards.
Now, this might not be the case with MySpace. However, I certainly agree with you on the fact that Dan didn't say a word about testing their 'hydra'. But on the other hand, he wasn't asked that question.
NOTE to Ryan: It certainly would be great to hear from MySpace on how they test the code.