InfoQ Homepage Presentations Theme Systems at Scale: How to Build Highly Customizable Software

Theme Systems at Scale: How to Build Highly Customizable Software

View Presentation

Speed:

50:20

Summary

Shopify Staff Engineer Guilherme Carreiro discusses building and scaling highly customizable platforms. Using Shopify’s Liquid theme system as a case study, he explains how to balance extreme design flexibility with low-latency performance under massive traffic. He shares insights on implementing secure domain-specific languages, native code extensions, and resilient developer tooling.

Bio

Guilherme Carreiro is a Staff Developer at Shopify, where he champions the evolution of Liquid. With over 14 years in software development, he helped build the foundational tooling that set Liquid on its path to greater adoption. Prior to Shopify, Guilherme led the DMN tooling team at Red Hat, delivering open‑source solutions that empowered users to build decision models with no‑code approaches.

About the conference

InfoQ Dev Summit Munich software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Guilherme Carreiro: Really excited to talk about how we can make our software more customizable today. A lot of software we use, they are already customizable and we don't even notice, like our operational system, our cell phones, our apps, there's a lot of customizable features there. That's no different at Shopify. Shopify is a platform where our users, our merchants, they can manage and create stores. Shopify takes care of everything, like inventory, shipping, and everything that you need to build your store, it takes good care of, including storefronts. This is a Shopify storefront, and that's the thing that we're going to focus on today. Everything that you're seeing on this storefront here is customizable and it's based on the same Liquid theme system that we have. Because it's highly customizable, that's the reason that every time that you open a Shopify store, they look very different from each other. However, high customizability is not the only challenge that we have, a second one is these stores, they suffer a massive amount of requests.

During the peak of this thing that we call BFCM, which is Black Friday, Cyber Monday, we have almost 6 million requests per minute, with merchants applying like last minute updates to their stores, with buyers refreshing their stores because they want to get the best deals, and the platform needs to respond to that. That pretty much frames the challenge. We need to build a software that looks very different when it renders for the end users, and at the same time, this needs to perform well. That's the thing that we're going to learn today, how we can do that. Also, to show a little bit about how things work under the hood for merchants, I'm going to show this demo here. Here we have a storefront. It feels very weird, this bar here in the middle, so I'm going to update this. I can open the admin, click on this section here, change the background. I can move the image to the left, increase the size a little bit.

Maybe I'm going to add a little review component here. I can just click in the interface there, search for a review, and here we have a review app, and now we have our review component. I can simplify this a little bit and increase the number of stars because this is a 4 stars t-shirt. I'm happy with this customization. I can just save it, and then I refresh my storefront and it's already there. Besides allowing developers to build highly customizable storefronts, they also are customizable by non-technical personas. This is the thing that we're going to cover, all the architectural components that glues everything together. Everything is based on Liquid. Liquid is our template language, but there's a lot of other stuff around Liquid to make this kind of experience possible. We're going to go over all these architectural elements. I could cover this from many perspectives, but my choice was to cover this in a chronological way.

First, we're going to understand how the language was built, and how later we made that customizable by non-technical personas. Then we're going to understand how we can run that really fast in production. Finally, how to build tooling around a language like that.

Personal Profile

My name is Guilherme Carreiro. I'm a staff engineer at Shopify. I work in the Liquid developer tooling team. I work pretty much in two fronts. One is building tools for developers that write Liquid templates, so for example, the CLI that runs in their machines, the language server that runs in their VS Code extension. I build those tools that run more locally. I also work on evolving the language to make it easier for them to write templates, so they can do more with less. There are these two fronts, code that runs in their machines, and code that runs in the platform to make new features possible. My work pretty much focuses on themes.

Themes, At Shopify

Today, when we talk about themes, we can think of a theme as a package containing two things. One is the information about the graphical appearance of some software, in this case here is storefront, but it can be your software as well. The second is the features it can do. It's not only about how it looks, but what it can do. When we talk about theme systems, we're talking about a set of primitives that enable different personas to collaborate together and build some experience. That feels a bit more subjective, so let's dig a little bit into this theme system part. One persona that we have is the merchant. The merchant wants to build a unique store, and we need to help the merchant to get that. The buyer wants to see the store, and that's all that they care about. These are the two non-technical folks involved in the process.

Consider that the merchant wants to participate in building the store. In the middle, we have theme developers. They build themes, and they sell those themes in the theme store. However, those themes, they frequently don't have everything that merchants need. Like in the example, we were building the storefront, and we needed that review component. We needed a little something different that's not already inside of the theme. That's the reason that we have this other developer here, the app developers that develop theme extensions. This is another kind of developer involved in this process. The last developer that we have that's a bit closer to the merchant is the theme developer that does some fine-tuning at the end of the code to make sure that the store looks like the merchant wants. We have three kinds of developers to keep in mind, and two non-technical personas. That pretty much frames the challenge.

We're going to talk a lot about themes. To clarify what a theme is for at Shopify, it's pretty much a directory with some files inside. Some files are Liquid templates. Some files are JSON files. We're going to understand today their importance and what they mean, because then when you're developing your own theme system, you can understand why these abstractions matter. For our sake, while we are talking today, they're just files in a directory.

These files, they render a page. To break down a concept of the page, everyone knows what an HTML page is. There are some meanings that I think are nice to bring today. We have the layout, this green part here, the frame. It's where we have a header, your scripts into the head element. Then we have the sections, these are these horizontal elements. They take the full width of the page. We have this thing that we call blocks. The blocks, they are these more granular elements that exist inside of the section. This is how we call the different elements of the theme. This really frames our challenge. We need to build a platform that accepts those ZIP files with that directory structure that represents the layout of a theme. That theme needs to be customizable by merchants.

Language

Then we start our journey. The first part of the journey is the language. Why do we need a language to represent our theme? Why do we need another DSL? Because most of the languages, they already have a template language. Why was Liquid built? Pretty much this is the summary. We wanted to allow developers to show the price of a product anywhere in the page. We didn't want to have that fixed part that's customizable in the page, and then developers can show the price there. We want to allow that level of flexibility. When Shopify was built, which is a Ruby application, the template language that's there is very similar to template languages that we have in other languages as well. It's this ERB templates. The problem with this kind of template language is that they are very permissive. They allow any kind of code in your view. For example, scenarios like this, they are very difficult to prevent in ERB templates.

Here we are iterating over some products that we have in the database. Then for each product, for example, a t-shirt, we are loading the variants. For example, blue t-shirt, red t-shirt. When we do this, we have that classical problem of the n plus 1 query, which is no good. When you work in your own company, you're building your own app. That's ok. You can set some guidelines to developers, let's be careful with this, and then we can avoid this problem. In our case, we accept templates from third-party developers. This is something that we need to be really strict and smart to prevent. Also, to illustrate how permissive ERB templates are, even this kind of code here works. This really paints a picture of ERB templates are not the solution. That's why Liquid was built, that was the motivation.

We need a way to express templates in a way that's safe. Liquid works almost like an allow list of things that a template can do. You can have conditions. You can perform loops. You can print values. You can transform data just like we're transforming here. We're taking the price, and then we're converting that price using this money filter that we have here. It's very safe, too, because we really know the thing that we're exposing for developers. The thing is it cannot execute any kind of Ruby code, it cannot access existing resources, access the database directly. It cannot do all these things that we can do with ERB templates. In the end, this is how a Liquid template looks like. It's very human-friendly. When it was built, it was intended to be updatable by developers, but also designers without a strong technical background. It's a very human-friendly language.

Besides the language itself, Liquid also enforces some patterns for users writing Liquid templates. This is also something that would be nice for you to keep in mind if you want to define your own DSL to express your theme. Liquid drops is the strongest part of the pattern. What does that mean? Here we have the entire public API for Liquid. On the top, we take a string, just a string that can be a file, and then we parse that string and we create that template. That template is what we call AST. Then we take that AST and we can render that template as many times as we need, parsing the data. Then we're printing the output. To parse the data here, we're parsing a hash, a hashmap. Instead of parsing the hash, we could parse an instance of something that's this thing that we call drop.

This instance, this product drop thing here is actually an instance of this code. I don't think that a lot of folks here have a Ruby background, but I feel like, in an intuitive way, the thing that we do at this level here is that we are memoizing our variables. Or, for example, calling product.title is an expensive thing, at this layer, we can save that thing. It's pretty much this, the drop layer. Another thing that we do with the drop layer is that we create the list of methods that we want to make available for third-party developers. If the product has some method that's really heavy, that doesn't get exposed externally. That's the role of the drops. They protect what we are exposing and they also save us some resources because of this cache layer. Another thing that we do at the DSL level is making sure that we have a safe playground for users when we're rendering their templates.

How do we do that? We do that by limiting the amount of stuff that templates can do when they are rendering in the platform, in the backend. Let's double check how that works. Here we have a simple Liquid template. We have a for loop. We are printing some numbers here. Then, at the backend, something that we set is the render_length_limit. That's what we're setting there for 50, that means that's the maximum amount of rendered things. In this example, I'm parsing an array of numbers between 1 and 10. If I try to render this template, that works. If I try to render a too large template, we get a memory error because we're being explicit about the size of the template that we want to allow. This is one kind of resource limit that we enforce. This is also something to keep in mind when we create a DSL that we accept templates externally. We need to be really careful about how much computation we are providing to users, because this way we have this safe playground and everyone can render their templates without impacting the performance of each other.

Another kind of resource limit that we set is for variable creation. For example, here we have two variables being created in the Liquid template. Our assigned score limit is 3. If you render this, this just works. If you reduce this assigned score limit to 2, then we get a memory error as well. This is what makes sense for Liquid. If you're creating a different DSL that will express better your intention when you're defining your own template language, maybe you have different requirements. For Liquid because we allow, for example, variable definitions, this kind of resource limit makes a lot of sense.

Another thing to keep in mind is backward compatibility when you're building your own DSL for themes. Backward compatibility is something non-negotiable, because if we build a platform and we break assumptions that we're defining for developers, the thing that's going to do is their theme that they build today is not going to work tomorrow. This frustrates developers, and they're going to stop making themes for your platform. Also, if you break backwards compatibility, a merchant may install a theme that works today, but maybe it's going to break during Black Friday. Backwards compatibility is a contract that we need to sign. As soon as we sign this contract, this impacts everyone. This is another strong concept that we need to keep in mind when we build our own DSL. I'd like to share some case here.

Here we have a template. We're comparing this temp with template, and we're printing that value. Most of us can look at this template here and think, actually, this thing is going to print false in the page if someone builds a template like that. However, by default, Liquid doesn't support expressions like this, doesn't support comparisons when it's printing values. By default, Liquid is going to just ignore the thing that it doesn't support. Instead of printing false, the thing that Liquid does is print temp, which is the latest thing that the parser understands when it's processing templates. Let's zoom in, into this example here. Here, we have another similar example. We're taking a Liquid template, this simple string here, and then we're parsing it and rendering it. This thing renders hello for the same reason it renders temp, because hello is the value of the index variable there.

This maybe sounds like a good idea. For example, we don't support comparisons. Maybe we can think like, instead of printing an error, just like in the second example here, if we take the same template and reparse using a more strict approach for parsing, then we fail with an error. Frequently when building DSLs, we're going to face this dilemma, should we be forgiving about errors and just render something, because in production, maybe it's better to render something than to render an error, or should we be more strict from the beginning? Think about backward compatibility. It's much better that you follow in production the second approach here, the second example, and be strict about how you parse and you interpret your DSL, because you can always evolve the error space.

In the second example here, you can always decide about starting support comparisons, for example, in your templates. Then you can evolve the language. While in the first one, where we have this more forgiving approach, we're pretty much signing a contract with developers. If they write an invalid template like this, the contract is that you're going to print the value of the left side of that comparison. That's no good because that limits a lot how we can evolve the language. This is something to keep in mind when you're building your own DSL, be as strict as possible because you can always evolve the error space. As soon as you sign a contract with everyone and your contract is forgiving, you're going to have a much harder time evolving the language syntax.

This frames the challenges that over the years we faced with Liquid, thinking about how we keep the playground there running using resource limits to make sure that playgrounds are healthy, and how we evolved the syntax of the language. When thinking about your own theme system, maybe you're going to say, I really need to invent a DSL. I really need to build my own programming language. You're likely going to face similar challenges as these ones. If you need to build your template language. If you really need to do that from scratch and you're not so sure where to start, I highly recommend this book here, "Crafting Interpreters". It's very friendly, very fun, very rewarding to read and follow. This is a good way of starting to write your own language, if you really need to do that. That's not necessarily something that I always recommend. It's something that maybe is nice.

Extending

With Liquid, developers can write their own templates. They can put the price wherever they want in the page. Now we have the second challenge. How can we make this extensible, because this must be customizable by merchants. Merchants, maybe they have a storefront like this and they actually want to have something like this. Today, if merchants want to do that, this is the thing that they need to do. They need to open a Liquid template and update the code there. The code is friendly, but it's still not so friendly for someone that's not technical. How can we make our Liquid templates extensible by non-technical personas? The first thing that we need to do is to start talking the same language between developers and merchants and the platform itself. If we all talk the same language, if we all refer to the elements in the page using the same nomenclature, then you can collab and build things together.

That's the thing that we made. We created this concept of sections. Here we have a header section in our template. Here we have the product detail section. Here we have another section. With this, everyone knows what is a section, being technical or not being technical. Let's zoom in, into this section here and double check how we can build, how can we write some Liquid that defines what we're seeing here that's customizable. Let's double check how we can write this template. In the top of this image here, we have the details about this template. In the left side, we render a snippet that represents the image, and in the right side, we have the description that represents the description of the product. Here below, we have this schema tag. This is the most important thing. I feel like this is the most important slide for us today when thinking about extending our code.

Into this schema tag here, we're doing something special. We are declaring a property called image_position. Then we're saying the name of this property is image_position. We're saying this property, image_position, can have two values, left and right. We're declaring this into this JSON form. Why? We're going to see why. The special thing is that we're defining this property here and we're using it there. We know that that CSS class can switch between left and right, change the position of that element into the screen. Because we're defining this schema using this JSON form here, we have a visual editor that understands JSON. Then our visual editor knows that when merchants click there, we can just show this right and left values, so merchants can interact with that component.

This is how developers created bridges with non-technical folks. The developers, they can choose a set of settings that they want to expose using the schema tag. Then the visual editor knows how to read JSON and make those properties available. Now the non-technical folks, they can use those properties to change the position of elements in the page. With this, we have now a better picture of how each element of the page is built, and it's built in a customizable way. Now let's review the entire request lifecycle so we can understand how the other elements of the theme impact the state of the page. Here we have a timeline. When a request reaches Shopify, the first thing that we look at is at the URL. Here we have this fun-food site. Because we know the URL, we know which shop we are accessing and which theme we should render. With the information about the shop and the theme, we can load that theme in memory.

The next thing that we do is let's load the global settings that we save as JSON files. Global settings are things like the background of the page. Developers can, just like we have the schemas exposing properties about a specific Liquid section, we have these JSON files that store global state about the entire page. Here, however, they expose that thing, because they are exposing background into that JSON file, merchants can update that visually using the editor and save that file automatically into the theme. This is the first thing that we do, we load the global properties. Then with the global properties loaded, we have the frame of the page set there. Then the next thing that we look at is, is this a product page? Because of this URL, we know that is a product page. It's not an index or anything like that. We know that you're going to use this JSON file to render this page.

This JSON file has information about all the sections, all those elements that take the full width of the page, they are stored in this file. Then we use them to render here into the JSON file. You can clearly see the order of the sections being rendered into the page there. Then we render the sections. Inside of that JSON file, we also define the blocks into that page. At this moment is where we replace the image_position that we were seeing from that example with the real value that the merchant set into the editor. Then we load all the sections and blocks with the proper values and the page is loaded. This is the entire lifecycle of a page when you open a Shopify shop.

Looking back at all the personas, it feels like we're covering everyone here. The buyers can see the page. The merchants can customize parts of the page. The developers can build a theme that's customizable. We're missing one person here, which is the app developer. How do they participate in the process of building our theme page? Again, just as a reminder here. The app developers develop this component here. That's something that we can add into the page, just like this. You can search for your component and that just appears there. How is that possible?

As maybe you're probably guessing, this component is just another Liquid template with a schema. If you were building, for example, a discount component, you would have an HTML showing, for example, the discount number there. You would expose the amount of discount as a property that's customizable by non-technical users, and then you use that property in your template. In the end, that's what the app developers do. They write this Liquid code snippet here, and then they upload into the app store so merchants can buy those components and install on top of their themes. This will pretty much close all the request lifecycle.

Running in Prod

We have the language where the developers can put the price anywhere. We have some technique to make those components customizable by non-technical folks using the editor. How can you run this in production in a really fast way, because Black Friday is coming, and with Black Friday, this amount of requests is coming? How can we scale this? Talking a little bit about the runtime infrastructure at Shopify, Shopify runs on top of Google Cloud and we adopt a multi-tenant architecture. That means that we pretty much have a set of applications with independent databases, and we use the shop ID as our sharding key. That's how we scale our databases which is the most difficult part of scaling into the application. The other components that are not the database, they're easier to scale because they are stateless. We have a Kubernetes autoscaler that scales the app based on the traffic, so this is a huge summary of how it works.

Because the database is the hardest part to scale, one thing that we do is we use active data replication to protect our primary databases, so we have these two key flows here. Merchants, they go to the admin and they do some store management, they persist the new Liquid templates, they change the image from the left to the right there. Then they persist that decision, that goes through core. Core performs the reads and writes, and that's our primary database. Into the buyer side, which is the more difficult part to scale, we have active data replication, and then we rely on those databases to make sure that we render things really fast. We have this component called the storefront renderer, and the responsibility of this component is pretty much render Liquid templates really fast. One thing that we have to make that happen is this approach that we do to handle data replication.

Another approach that we adopt to make sure that we're rendering templates really fast is the usage of native extensions. This is something that if you build your own DSL or if you're using another person's DSL, you likely are going to adopt a native extension. What is that? We are all programmers, we code a lot. Because we do this frequently, over the years we adopted some tools to help us, so for example when you create a string, you don't need to clean that string in memory, so we've adopted these high-level languages, and they have the VMs, and those have positions in memory, they're garbage collected. We have all this benefit of the high-level language. They're very productive, and that's a good thing because that allows us to code a lot of stuff and have all that support from the runtime. However, some challenges tend to not play so well with these high-level languages, and that's when we adopt native extensions.

This kind of code we write in a lower-level language, something like C++, C, Rust, and then we write and we solve part of the problem using a low-level language. Then we are able to call those native extensions using a high-level language. This exists in a lot of languages. At Shopify we generally use Ruby as a high-level language and Rust as a low-level, but a lot of Java applications adopt the Java native extensions, Python has the C extensions. This is a common practice to solve this kind of problem that's quite isolated and it's quite computational intensive. A lot of languages, for example, implement JSON parsers using native extensions for that reason. That's something that if you write your own DSL, when you're rendering your DSL in production, to make sure that it renders fast.

What are the benefits of the native extension? Because it's something very low-level, you don't impact the garbage collector. Some algorithms that we need to run, sometimes it creates too many instances of objects and that creates some abnormal behavior in the garbage collector. In other words, the app needs to pause and clean the references, and that's no good. If you have an algorithm that creates a lot of instances, and while benchmarking you notice the bad impact that it has in your garbage collector, this is likely a good candidate to extract that logic to a native extension and then you can manage the memory yourself. Of course, this is a blessing and also a curse because then you can have memory leaks so it's very dangerous. This is something to keep in mind. Garbage collector is one thing. Another thing, another benefit that native extensions bring is performance. Native code runs faster. Even though it's harder to write, it runs faster.

Another reason to use native extensions is reusability. For example, if you already have a very good JSON parser written in C, why can't we just reuse that JSON parser instead of rewriting it in a high-level language? Reusability is a fair option. Just to clarify how it works in practice, and this is a bit of a massive overview. When we create our high-level application, our Ruby application, we have our Ruby VM running there. It runs our Ruby scripts. It runs the garbage collector, which does much more stuff for us to make sure that the runtime is healthy. This is how it works. However, when we're using a native extension, as soon as we start our program and create the process, one thing that we do is linking this kind of .so file into the process that gets a dynamic link there.

What is this .so file? This file is something that you compiled. Maybe you wrote your native code in C or Rust, and the thing that you do is you compile that code into an .so file, that stands for shared object, and when you're starting your Ruby program, you dynamically link this file. Because this file is compiled following certain conventions, generally the C conventions, the functions that exist here, they're accessible from the high-level language, so we can call this native code from our high-level language in the same process. It's a very cheap call that we do here, because everything is happening in the same process. However, there is still a little bit of a cost, because when you create, for example, a string in your high-level language, be it Ruby or Python or another language, that string is created with some set benefits.

I'll talk again about the garbage collector. That string is managed by that, so you know that that memory is going to be free. When you create a string here in your native code, you manage that memory, and that string exists only in the context of your native extension. Because we have this kind of difference between data types, every time that you perform a call from a high-level language to a low-level language, you have some cost of serialization and deserialization of these data types, which generally pays off the cost of the serialization and deserialization, but sometimes it doesn't. That's why I always like to show this example when I talk about native extensions.

Here we have some code fully written in native, in a high-level language, fully written in Ruby, and here we're doing some computation thing. Then let's say that computation part is the core of my business, and you know what? I'm going to extract this to a native extension, because then it's going to run faster, and the application is going to be better. Let's say that I've extracted this thing to a low-level language, that language being Rust here. Then let's say that I got excited, and then I extracted the entire thing to Rust. We have two versions of the same code, with different levels of usage of the native extension. Looking at these three versions here, it's a bit obvious that this one is going to be the fastest one. Everything is running native in language, so it's going to be faster.

However, the interesting part is that the second option is the slowest one. Even though we have the native extension being used there, because we're not being smart about this, we have so much back and forth between the high-level code and the low-level code, that the cost of serialization and deserialization matters, and then you have slower code than you would have by just using a high-level language. Every time that you think about, I want to extract this part of the code here to use a native extension, it's really important to keep in mind how frequently you're going to perform these calls. Then benchmark is the best way to make sure that it pays off. All the efforts that you're using, you're adopting to use native code.

Tooling

Again, we have the language, we can put the price anywhere. We have the schemas, we can make our code customizable by merchants. Now we know a little bit more about native extensions and how we can make our DSL in our theme to render fast in production. Now, let's say that developers started building themes for your application. However, they're not good themes. That's where tooling really matters. The tools that we make, they must inspire developers to write the best code possible. That's something that we needed to solve at Shopify. The first tool that we built, one of the first ones was the profiler tool. We render our templates in the backend. Developers, they have no way to optimize their time to first byte if we don't provide them a profiler tool, because they don't know what's happening in our backend.

We render their templates for them, so how would they know the slowest part of their Liquid templates? I created this command here, the shopify theme profile, and then they can run in their machine and get performance data about how long it takes to render each part of their templates. This inspired developers on writing code that has a better time to first byte, because now they have the visibility and they know what part of their code they can optimize. Another tool that we built to inspire developers to write better code is theme check, which is our linter. theme check helps interesting areas like this. Here we have a parsing block script, and while developers are writing their template, we show this squiggly line here, and we say, here we have a parsing block script. You can use a defer or async here, and make your template better.

This was a channel that we created with developers to engage them on writing better Liquid templates. Another thing that we introduced in the linter is some checks like this one. Here in the template, they're using this market object, but this object does not exist. This is another opportunity for the tool to say to the developer, this object does not exist. Otherwise, they would need to upload their template into the platform and then see that's not working as expected there. That's no good. Another thing to keep in mind is that it must be easier for folks to write their templates. These days, it's almost an expectation that while folks are writing templates, they will have some code completion support. Then we needed to also build a language server, and that's also something to keep in mind when you build a DSL.

When you invent a language, it's not about only inventing the language. We need to invent the tooling around this. When folks write the code in your DSL, they have a good time writing that code. This is a demo of our language server. As you may notice, here we know that these properties are available there, and then it's pretty easy for developers to double check the documentation for that thing. Before, they really needed to guess the properties or refer to the documentation, and that's not an easy experience for writing templates at all.

To make sure that you can build tooling around your own DSL, another component that we're going to need is a tolerant parser. What is that? Here we have a Liquid template, and as you may notice, we have an error there because we have a forward loop, and that loop is not closed. Even though we have that error in the template, we are able to parse the template, infer the type of product, and then know these properties are available there. Again, when inventing your own language, you need a good parser to parse in the backend, but you're also going to need some parser that's more tooling specialized to handle this kind of template, a parser that's tolerant when templates are broken. Because when folks are writing their code on VS Code, they're always in an unfinished state.

They're going to always be broken, and the tooling must be able to parse those templates and understand it. Some languages, they have a single parser for the backend, for doing the DSL job, and for the tooling. This parser is fast, it's error tolerant. For Liquid, for our template language, we have a parser that's really fast to render in production, but it's just Liquid aware. However, for the tooling, we have another parser implementation that understands Liquid, and also knows how to parse HTML. For that reason, when we understand, for example, when folks miss to close an HTML tag, we're also able to give that hint to developers, and say, you can close that tag.

One suggestion that I'll leave for you is, if you need to invent your language, this library here, Ohm, is an excellent option to build tooling. You don't need to implement the parser manually, it uses a grammar file, so you can express the rules of your language, and it's going to generate the parser for you. It's quite fast. It actually got faster in a recent release. It's a great tool to build parsers faster, and then you can write your tooling. This is a TypeScript library, so it's a way for you to build the tooling in TypeScript. It's very friendly to build VS Code extensions.

Another piece of context that we have here, so here we have a Liquid template, and as a reminder, we have that schema tag as a JSON content, what we have there. As you're probably guessing, we also need a resilient JSON parser to be able to provide useful suggestions for folks while they are writing their JSON logic there. We didn't implement a JSON parser from scratch to provide these kinds of suggestions here for users, because Microsoft already has an implementation of a JSON language server.

All that we need to do to make sure that our extension provides code completion for JSON files is instantiate this language server. When we do that instantiation, we use a JSON schema to express what's the expected shape of the JSON file we have, what are the rules. With the JSON language server and the JSON schema, like specifying what a valid JSON is, we already have the suggestions for free here. This was also a lesson learned for me. Because we invented Liquid, we need to have this whole work implementing a tool for Liquid, but because JSON is already something that exists, we can just reuse tooling that's already built for JSON.

Summary

This is a bit of an overview of the loop of all the elements that you need to consider when you are creating your own theme system. Maybe you want to consider a language, you need to keep non-technical folks in mind. It needs to render really fast. If you are inventing a new language, then you need some tooling to engage developers on writing themes in the way that you expect, so you can have high quality themes in your platform. These are the four items.

If I needed to choose my favorite one from these four items here, if I needed to implement a theme system from scratch today, I believe that this one is the most important, because this one is the thing that invites non-technical folks to express their expectations about their themes. I would look at my application and think, how can I invite non-technical personas to update this part of my application? Then I would evolve from there. Maybe you need the language, and then we're going to have all that work that I mentioned to make sure that the language that's invented renders fast in production. You need to implement some tooling there. If you can adopt a language that already exists, then you can save a lot of cost in that decision.

Questions and Answers

Participant 1: You mentioned that you wanted a language which runs on both the client and server. The obvious choice would be JavaScript. Why didn't you choose that and instead went for Ruby?

Guilherme Carreiro: The question is why we chose Ruby as the language to write the application?

Participant 1: To write the runtime. I think the templates are defined using some Ruby code. Ruby runs on both server and client. I was expecting the choice to be JavaScript so that the template could run on server and on client.

Guilherme Carreiro: Shopify was built as a Ruby application for many reasons in the past. The reason that we implemented the tooling in TypeScript, it was because generally the VS Code extensions, they are built in TypeScript, that's why we had different decisions about the parser.

Participant 2: Regarding theme profiling, so you mentioned how as a developer, we already know which piece of code runs the fastest, which piece of code runs the slowest, so why are we not exposing that information to the non-technical person? Why do they have to run it locally to check it themselves?

Guilherme Carreiro: Currently, we expose only for developers while they are writing the template, because they have more ownership to act on those fixes. Generally, the code that is slow to render in the backend and, in fact, time to first byte, is because folks have nested for loops. Sometimes it seems a bit obvious for us, like there's a nested for loop there, and it's just clear that you need to remove that for loop. Sometimes you have, for example, nested templates with one file render the other, and that nesting nature gets hidden, so that's something that happens, that developers have a lot of ownership to fix. In the merchant side of non-technical folks, they cannot act on the fix. We are transparent on sharing the Core Web Vitals with non-technical folks, so they know how fast their templates are rendering, but they don't have the granularity of knowing which part takes more time to render, because they can do little about that. We don't expose that today.

Participant 2: They don't need to go into the granularity, but as a whole, can we share information about the whole theme, that which theme is the slowest, which theme loads the fastest?

Guilherme Carreiro: Exactly, they do know that because we expose those Core Web Vitals. For example, if merchants install a theme into their store, and that's slow, they're going to know, now my time to first byte here is bad, but they're not going to know which section is the slowest. They need support from some developer to run the command and debug that to get extra information. I totally agree that in some context, depending on the theme system that you adopt and the language that you adopt, maybe that makes sense to be exposed for the non-technical personas. In our context, currently, that doesn't make, but also really depends on how things are going to evolve over the years. We currently don't expose.

Participant 3: I saw the diagram where you showed replication from the merchant's database to the store's database, the rendered text is copied. You said partial replication. By that, you meant the primary copies to their database. The first one is synchronous, and the rest, replication happens asynchronously, or you meant something else there?

Guilherme Carreiro: Yes, the replication is something that happens asynchronously, but the writes are synchronous, indeed.

Participant 3: In this merchant-sized database, but all the replication happens asynchronously, isn't it?

Guilherme Carreiro: Yes. Because we adopted this multi-tenancy application, we keep in mind that this synchronous thing happens inside of the silo of that specific database, because the shops are re-sharding to isolated instances based on the shop ID, so this is something to keep in mind.

See more presentations with transcripts

Recorded at:

Jun 01, 2026

Guilherme Carreiro

InfoQ Software Architects' Newsletter

Theme Systems at Scale: How to Build Highly Customizable Software

Summary

Bio

About the conference

Transcript

Personal Profile

Themes, At Shopify

Language

Extending

Running in Prod

Tooling

Summary

Questions and Answers

Related Sponsors

This content is in the InfoQ topic

Related Topics:

Related Editorial

Popular across InfoQ