BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Stack Overflow Becomes HTTPS by Default

Stack Overflow Becomes HTTPS by Default

This item in japanese

Nick Craver, architecture lead at Stack Overflow, has published a blog announcing Stack Overflow's migration to HTTPS. Some of the technical challenges along the way included supporting hundreds of domains, migrating URL’s, user generated content, and meeting the sites stringent performance requirements.

The migration has taken four years overall, but Craver highlights that the work hasn’t always been a priority. For example, there’s no financial information or credit card payments on the site, and the information is not that valuable to secure. In fact, Craver states that Stack Overflow has always prioritised performance over security:

I’m okay saying that our primary drive is performance, and security for the site is not. We want security, but security alone in our situation is not enough justification for the time investment needed to deploy HTTPS across our network. 

Craver explains that it is, in fact, the wider adoption of HTTP/2 and its performance benefits which were the main driver of rolling out HTTPS. Some of these include request / response multiplexing, server push, header compression, stream prioritization, and fewer origin connections. As browsers do not support HTTP/2 over HTTP, it has made HTTPS a performance requirement as much as a security requirement.

Stack Overflow has hundreds of domains and subdomains. This has lead to the site’s main certificate containing all of Stack Overflow's primary domains and wildcards. This gives a performance benefit by allowing browsers to use a single shared connection for multiple domains, a mechanism provided by HTTP/2 if the certificate and IP's for them are the same.

Some domain moving was also required, such as the "meta" domains from "meta.*.stackexchange.com" to "*.meta.stackexchange.com". Craver highlights that this is because the wildcard must be the leftmost part of the domain, and having a single wildcard is more maintainable.

Also, as sensitive cookies on the site are now inherited from the top level domain, domains that should not be able to access them have been moved. An example Craver gives is SendGrid, which now lives at "stackoverflow.email".

Craver also lists the large volume of HTTP user generated content there is on the site, such as images in questions, user profiles, youtube videos and many others. To migrate, the first step taken was to force HTTPS for any new user generated content. This limited the HTTP content to only legacy. Then, the next stage was to move the existing resources. The majority of content on the site involved a simple find and replace. If there was content which the team was unsure about, then it was either converted to HTTPS or to a link if it did not work.

On the Javascript side, there were also thousands of links in code which made assumptions about HTTP and even different "meta" domains. Craver explains how they had to go through each of them and replace them with calls to "<site>.Url(‘/path’)" which essentially made them dynamically change to HTTPS whenever the feature flag was enabled.

Craver also emphasised that not losing any traffic from Google was also something of utmost importance, as this is where the majority of traffic and thus revenue comes from. So whilst the change for Google required was straightforward (301 from HTTP to HTTPS, and updating canonical links), the team had to be careful not to make a mistake.

The final part of the migration involved websocket. This involved replacing everything with websocket secure - something whilst functionally straightforward needed to not impact the performance of the site. According to Craver, over half a million concurrent websocket connections can be open at once.

The full blog is very long, and worth reading in its entirety.

Rate this Article

Adoption
Style

BT