Stack Overflow has recently completed migrating their system architecture from .NET Framework to .NET Core. This is the platform that powers not only Stack Overflow but also 170+ question and answer communities, as well as private Q&A sites for companies.
"Here at Stack Overflow we're migrating many things to .NET Core. Along the way we have to swap out parts that existed in the old .NET world but don't in the new."
Nick Craver - Architecture Lead
From: About - Stack Overflow
The latest changes affect the web and service tiers of their architecture which are moving from ASP.Net MVC to ASP.Net Core.
The web and service tiers provide the main Q&A application, Careers, Mobile API, API v2 and the Tag Service. High availability is provided in these services through multiple layers of redundancy. The persistent storage is SQL Server which is also made highly available via Availability Groups.
The front end CDN, DNS and load balancing and backend Elastic Search and Redis caching components are not affected.
In parallel to the change to .NET Core, StackOverflow has also been making changes to their user interface to support a greater range of accessibility and the dark mode. The back end has also been changed to support the latest version of SQL Server 2019.
And all these changes have been happening whilst the system has been continuing to support the millions of monthly users.
InfoQ recently sat down with Nick Craver, architecture lead on the migration, and explored the challenges in more detail:
InfoQ: Why did you decide that now was the time to move to .NET Core?
Craver: We didn't really decide anytime recently, we decided years ago that we wanted to go this direction. Marc Gravell and I began making our open source libraries (like Dapper, MiniProfiler, StackExchange.Redis, etc.) compatible back in the pre-.NET Core 1.x days. We knew these dependencies would have to be done before any app using them (including our main applications). After we shipped Stack Overflow for Teams which was a significant Architecture (my team!) time expenditure, we then moved on to .NET Core as our primary project (not only, but the primary). That migration took about 1.5-2 dev person years to complete.
InfoQ: What components needed rewriting?
Craver: So this is a little complicated, bear with me. In .NET Core 1.x days, the APIs changed drastically, so when we were making our libraries compatible, we had significant rewrites to quite a few components to make them work. In .NET Core 2.x where ".NET Standard" became a thing and compatibility increased tremendously, most of that rewrite effort was obsolete. In other words, if we were later adopters, we would have done less work. But then others would have been blocked for months to years...life is full of tradeoffs :)
There were few rewrites, but adjustments and shims in place for the migration as we moved in not one "big bang" approach. We'll be blogging about those in more detail soon.
From: Stack Overflow The Architecture 2016 - Nick Craver's Blog
InfoQ: Were there any specific areas that caused problems?
Craver: For the apps, basically wherever we've gone outside the framework to extend, replace, hack, etc. to make Stack Overflow work the way we want or scale the way we need - that's where pain usually lies. So our request pipeline, the order we spin up authentication, and custom filters in ASP.NET were the main areas. Form validation is also much more aggressive in .NET Core so bots trying to slam us initially caused more noise, but that's just a nuisance and a consequence of being a big website.
InfoQ: Were there any unexpected benefits or spin-offs from the migration?
Craver: I wouldn't say much was unexpected by the time we converted from the migration. We're looking into HTTP trailer headers for some timing purposes as a result of .NET Core being much more output-stream oriented. It was something we expected but not to the degree we saw it and had other complicating factors unrelated to the Core move. Overall, we want trailer headers and we want to be Linux compatible, etc.
We've already got macOS and Linux very close to running which means our designers on Macs won't need a Windows VM and we're about ready to run in containers for some deployment scenarios. For example, imagine a URL for a PR you can go visit to poke at, test, etc. We want to do that with containers and this gets us one step cluster.
Interested readers can learn more at the Stack Overflow blog and podcast or on Nick’s blog. If you want to ask questions about their sites you can post to Meta Stack Overflow. You can reach Nick at Twitter @Nick_Craver or Stack Overflow Users nick-craver