InfoQ Homepage Presentations Theme Systems at Scale: How to Build Highly Customizable Software

Theme Systems at Scale: How to Build Highly Customizable Software

View Presentation

Speed:

Download

50:54

Summary

Guilherme Carreiro discusses the architecture behind Shopify’s theme system, focusing on balancing extreme customizability with platform stability. He explains how they leverage Liquid as a safe DSL, optimize performance via native extensions (Rust/C), and use JSON schemas to bridge the gap between developers and merchants.

Bio

Guilherme Carreiro is a Staff Developer at Shopify, where he champions the evolution of Liquid. Prior to Shopify, Guilherme led the DMN tooling team at Red Hat, delivering open‑source solutions that empowered users to build decision models with no‑code approaches. Passionate about crafting developer tools that streamline workflows and boost productivity.

About the conference

InfoQ Dev Summit Boston software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Guilherme Carreiro: I am so excited to talk about theme systems today. I really like customizable software. Most of the good software, good apps we use today, they have some level of customizability. You can change fonts, colors, backgrounds. With Shopify, it's no different. Shopify is a platform that allows merchants to create their online stores. You have shipping. You have inventory. You have everything that you need to run your business. One important thing that you have is storefronts. That means that merchants can build their stores the way they want. This is an example of a storefront that we have at Shopify.

One important thing is that merchants, they can really change anything here. It's something very special because we have millions of merchants, and even though they share the same foundations, they're very different between each other. How do we scale a software that can render anything, like the pages can have any kind of element there? How can you do this in a way that you can set some standards for your users or the buyers? Just for you to have an idea, during the peak of BFCM, we call it the Black Friday, Cyber Monday weekend, we have almost 60 million requests per minute. With merchants applying last-minute updates into their stores, with buyers refreshing their stores, because generally those stores are closed at the time until the deals are available, so everyone is really consuming the application. Even though we can render all those stores that are really different and provide a stable performance, how do we do this?

I'm going to show a demo of how customizable a store is. Here we have a store. It's a little bit dark. I'm going to update this storefront that we have here. I'm going to change the color to make the background a bit more yellow. This is how it works. I can go to my editor. I can change the color scheme. I can change the position of this image here, make it a bit larger. For example, you can add here a component to add some reviews about your product. Here I'm looking for a review score.

Then I go here into the app, I add the component there. Then I can, for example, increase the number of stars in this component that I have here to have four stars for this T-shirt. It's a four-stars T-shirt. Then I can just go back to my storefront and refresh, and it's there. It literally took less than one minute for me to really change the shape of the storefront. If I get really tired, for example, I don't want to use this theme anymore, I can just go to my admin, select a different theme, install it. I go there, I refresh, and it was even faster to completely change the look of that store. Of course, we can see some patterns here. Like we have the image, we have the buy buttons. These are all conventions. Folks can do, really, whatever they want in today's store.

Background

This is the thing that we're going to do today. We're going to understand how we can build this kind of software, how we do that at Shopify. My name is Guilherme Carreiro. I work in the Liquid Developer Tools team. We take care of the Liquid theme system at Shopify. I have the privilege there of building tools for other developers, so I build the language server, the CLI where they can download and upload themes into their machines. I also introduce features at the platform level. I introduce, for example, new features into Liquid, the template language, so it gets a bit easier for developers to do more stuff with less work. My work has these two sides, the backend and also the developer tooling side.

Theme

To set up some common language for us, today we're going to talk about themes and about theme systems. When I talk about theme, I am talking about a package that contains all the information about the graphical interface of your theme and the functionality it can do. It really controls these two things, not only the background, but also the presence of some elements in the screen and how to use them. When I talk about a theme system, I'm talking about a set of primitives that we have there. Multiple folks, multiple personas can collaborate, understand the same set of primitives to set the layout of some storefront. The theme system is this set of primitives that allows everyone to collaborate together to build a store. Who are the personas involved when you think about the theme? We have the merchant, the person that wants to create a store. It's a non-technical person, but they have an idea. Then you have the buyer. It's an important persona as well. It's the person that wants the storefront to be rendered there, to be as fast and as reliable as possible.

Then in the middle, we have the theme devs. They develop themes. Some theme devs, they exist inside of our company. Other developers, they have their own business, they only build themes for Shopify. These folks, they create themes that are unique and customizable by merchants. They can install there, and merchants can also change elements using our editors. However, sometimes you install a theme and the theme doesn't have everything that you need. For example, when I was updating my store here, I need to add an extra component there, those stars, that review component. Then we have another kind of developer walk into this ecosystem, it's the app developers. They build components that can be installed on top of themes. Finally, we have another kind of theme developer, these folks that work directly with merchants, updating their code into their stores to make their themes unique. We have three kinds of developers and we have these two non-technical personas that also have their own requirements.

Liquid - Why Anther DSL?

This is the shape of the problem we're going to understand today. When we think about how we're going to solve this problem, the problem is this, we want to render a page here. This page has some important elements. We have the layout, so the thing that contains everything, this green part here. We have the sections, which are these horizontal elements that we have in the page. They always occupy the full width of the page. We have the blocks. There are these components here. For example, the mini shoe that we have there, the orange one. These are the three main components where all these personas collaborate. Myself, as the platform developer, this is my challenge. I need to build a system where other developers can build themes. Also, I must support non-technical personas because the merchants should be able to edit these themes. This encapsulates the challenge that we need to solve.

The first foundational element here that we're going to tackle to solve this problem is Liquid. What is Liquid? What's another DSL? The thing is, we really want developers to be able to write anything they want to have in their storefront. If they want to have a completely blank page with one text, with the name of the product, they're able to do that. They're able to do anything. However, when you have a platform and you allow developers to write any kind of code that's going to render in your backend, that can pose some challenges.

For example, when we created Liquid, our own template language, our own DSL, it was 20 years ago. This was the kind of template we had at that time. Here, we have a very common problem when we think about web applications that can have complex code in the view. Here, we are iterating over an array of products, we're loading them. We are making a query in our database. After we load every product, then we're loading the variant here. When we do this, we have that n plus 1 query problem. It's a classical problem. It's no good. When you are working at your own company and only your own first-party developers can update your templates, that's no problem because you can set your own set of conventions. You can avoid this.

However, when you build a platform where folks upload the templates to create these stores, you're exposing your platform to have this very slow experience. You're risking how stable your platform is. Also, you're risking the experience for folks. This is a problem. Also, just to be really clear about erb templates, even this code here is legal and really executes because it allows any kind of Ruby code to be executed inside of the template. This was a challenge. Shopify was a Ruby app at the time. It still is a Ruby app. We were considering template languages that were used in Ruby.

Then we said, we need a custom DSL to support developers here. This is the thing that Liquid has. These are the key values of Liquid, the safety and boundaries. Liquid works a bit different because it's almost an allow list of things you can do in your view. You can have conditions. You can perform iterations. You can print stuff. You can transform data. For example, here, in this example, we have a product there. We are transforming the price to format that price. We have that pipe syntax, which is something that we call filters. This was the reason that we created DSL. We wanted folks to write their own templates to consume our data, but consume that in a safe way so they wouldn't shoot themselves in the foot.

The things that Liquid cannot do, Liquid cannot execute queries. As soon as we pass the states to Liquid, Liquid cannot modify state, cannot interact with the database, cannot consume system resources, cannot do complex stuff, can really do the things you allow. Another strong value that we have on Liquid is that it must be as simple as HTML. Anyone, any designer or less experienced frontend developer that doesn't know so much about the backend can understand this code here and really build some theme. The simplicity was a strong value that we have on Liquid.

How can we load data into our templates, avoiding that problem that we've solved in the products? How can we load data in a way that's safe? Liquid drops, it's the pattern that we adopted to make data available to templates when they're rendering storefronts in a safe way. The way that drops work is this. Let's double-check this piece of snippet and understand how Liquid works a little bit better. Here we have some lines, we require Liquid. Then the first thing that we do is we have this template string here and then we parse it. When you parse this template string, the thing that we do is we validate the syntax. It's the first process when we allow folks on uploading a theme into the platform, the first thing we do is parsing the theme to double-check the syntax.

Then we build the AST. Also, of course, we parse the templates when we are rendering storefronts. As soon as we have the templates here, as soon as we parse it, then we can render as many times as we want. This is the first thing that we do. We parse the thing and then we render as many times as we want. Here, we are rendering parsing some state. The process that we follow with templates is this. We take our DST, the resulting of the parse, and then we visit all the nodes using this state to replace the product name with the real product name, transforming the price into money. This is an overview of how a Liquid template gets rendered. Here, we're setting the state as a hash. It's a bit of a primitive way of setting the state of our page. We're passing the product here, the product title and the price as a hash.

Instead of doing this, we can just pass this thing that we call product_drop. What is this instance of product_drop that we have here? It's a simple instance of this class. This is the lowest level cache that we have in our templates. It's a common technique that we have. Every time that you're programming something and you know that it's expensive, we generally memoize the thing. This is the thing that Liquid proposed in the end, like having this very low-level caching layer. Also, another important thing is that we create the list of methods that we want to expose externally. Internally, our product has a lot of information, a lot of things that we can do. We only expose externally the fields that we want because we adopt this product_drop abstraction. This is the first thing. It's how we expose data in a way that's fast for users.

Another thing that we do with Liquid, and if you're considering adopting a DSL in your application, you should also consider doing this, is being really strict about what kind of code the DSL that the users who are uploading our application can do, what they can do. This is the important thing about the resource limits. Let's understand what that means. Here we have a template, we have a for loop where we are iterating over numbers. We have this source of code here. I'm setting the render_length_limit as 50. If I try to render a very small array, that just works. Let's say that theme developers, they create a very large array and try to render a very large template, like this into the second example here. I'm rendering an array of numbers within 100 elements.

In this case, I get this memory error limit. This is one way of when you create a DSL, you can be really strict to be sure that you're providing a safe sandbox for the developers building stuff for your own application. The render_length_limit is one thing. Another kind of resource limit that we have is the amount of variables that developers can create. For example, here I have a template where I create three variables, like a, b, and c. If I set my assign_score_limit to 3, I can just render this, and it just works. If I change the assign_score_limit to 2, this is another scenario where I get an error. This is how you can be strict from a second perspective.

Talking about being strict as well, another strong value that we have in Liquid is ensuring backward compatibility. Why? Because if you have an application and you want external folks building themes for our application, you don't want your themes to keep breaking every time. This is not rewarding for theme developers because they are creating something that works today that doesn't work tomorrow. This is terrible for merchants, for buyers, for anyone. Nobody wants your application to break. When you expose a DSL externally, it can be very tricky to think about backward compatibility. I'm going to show this example here. Here we have a super simple template of one line. We have this index with this == sign to template. When you think about this template, it can sound very inoffensive, like, this is a comparison. However, something very special about this Liquid template here is that it's invalid. Liquid doesn't support comparisons when it's printing values into your view. It's a controversial Liquid template. One thing that I can do is, I can parse this template using the error_mode lax so I can be more forgiving during the parse process. I'm not necessarily raising any error when I'm parsing this template.

Then I can try to render, parsing my variable index as hello here. Then I do this, and then I can execute this code. What happens in this Liquid scenario here is that we get a hello printed. Why does this happen? Because when you parse your template using the lax mode, you're more forgiving, so as soon as you're parsing and we notice that we have this == sign here, and our parser notices that this is not a valid Liquid syntax, we just stop parsing there. We consider that a valid template, and then we just render. That's the reason that we render the hello, because the last valid thing that we have here is the index variable. This is one way of parsing a template in a more forgiving mode.

Another way of parsing this template here is that you can parse being strict. As soon as you're parsing this template and trying to render it, you get an error. You get an error even during the parsing. Then we have these two ways of processing the same template. We have this more strict way where we raise an error when something that we don't understand exists in our template. When we have that == sign that we don't understand, we have this more forgiving way. As soon as we find something that's invalid, we just ignore that, and then we just render the index. When you create a DSL, it can be very tempting to be forgiving about the parsing and say, ok, I'm going to adopt the lax parsing mode. I'm going to be more forgiving about the shape of the DSL that I'm processing here.

Then I can avoid errors in production. Maybe I want to adopt the lax parser. I want to be more forgiving about errors. However, one problem with this approach here is that as soon as we start considering this kind of code here valid, we have a problem because we're signing a contract with everyone. We're signing a contract about our DSL with merchants that this template will always render hello, with developers that this kind of code works.

Then if you want to start supporting our DSL, if we want to start supporting Liquid comparisons when we're printing values, then we cannot do that. We cannot evolve the language anymore because we already signed a contract with everyone. It's much better to be strict about what you render, strict about your DSL, and be clear with everyone about the things that are possible and impossible so you can evolve your DSL. This is not so much something about Liquid. This is about every time that you create a DSL, that you allow third party folks to write that code, be very strict. Be strict about the runtime. Be strict about how you handle that thing. Don't be forgiving. Be clear about what's valid and invalid. This is the first part. This is the Liquid. With the Liquid code here, the first building block that we have is that developers, they can write any kind of template. They're going to render and it's going to render fast because we have strict limits about how we render those templates.

Schemas

The next step here is that this must be customizable. Right now, merchants, for example, they have a store like this, and they want to do this thing here. They just cannot do it because the building block that we have so far is Liquid. Merchants would need to change the position of that element. That's no good. We want to make our templates customizable. How can we do this? The best way to make developers and non-technical folks collaborate is setting some common language so they can work in the same thing, call this stuff for everyone to understand the same language and then they can collaborate together.

The way that we did this with Liquid was we established the common language about what are the names of the things in the page. We created the concept of sections, these components that occupy the full width of the page, and they're always sequential. Everyone understands what's a section. Here we have the section header, here we have the section product details, and here we have another section. A bit more focus in this product details thing. How can we provide developers experience where they can write some code that's updatable by non-technical folks? This is how we do here. I feel like this is one of the most important slides about how we can make our code editable by non-technical folks. I have two important elements here.

At the top, we have the HTML code that represents the product section. We have the image being rendered there in some div, and then we have the description in the side. We have these two elements there side by side, and below we have the schema. In this schema is where we establish some interfaces with non-technical folks. In this schema, we're saying, this template here, it has a property called image position, and this property has two valid values. It can be left or right. This is the thing that merchants can select in the editor. We use this image position in our template here. This is how developers use the value that's set by merchants.

This JSON structure that lives inside of this schema tag that you have below here is the bridge between developers and non-technical folks. The cool thing is that our editor understands this JSON. It's the shared language between developers and our editors. How this thing gets rendered in the editor. If we select example, we're going to notice that we have photo position here, and then you can just select. Now merchants are invited to the editing process because they can participate, they can set those values, and developers can expose and can create the values they expose. This is how each part of the template gets editable. You can add as many properties as you want there.

Maybe you're thinking, we have this Liquid file here, but how do things get really rendered? How this template and this state is really set. I'm going to go over the entire request lifecycle when a buyer enters a Shopify store, and then you're going to understand how these settings are loaded and how the state is set into a page. Here we have our request. There's nothing so far. A buyer opens a product page and there's a lot of information here in this URL. With this URL, we already know the shop ID, so we know the merchant that owns that store, and because every shop has a single live theme, we know the theme file system that we should load in memory. Here we have the theme file system loaded in memory.

The next thing that happens is that we load the global state. This is the convention that we adopt for loading the global state. Similar to how section files work for setting the shape of the data that we expect and the values, we have the same thing globally here. We have the settings schema and the settings data. That means that developers can expose a global state using JSON data structures as well. Non-technical folks can interact with these data structures using the editor and set their own values there. That gets persisted as JSON data too. Again, the JSON files, they work as this bridge between non-technical and technical folks.

Then, because we are in the product page here, because we know, based on the URL, that we are in the product page, we know that the state of this page is going to be dictated by this product.json file. This is another element that sets the state of the page here. In this product.json file, it's the one that merchants, that are non-technical folks, use the editor to set the sequence of sections, because everyone knows the sequence of the sections, everyone knows what a section is, so merchants use the editor to just select the sequence of the sections they want to see in their editor. This gets persisted into the JSON as well. This is how the state of the page is set. Into that product.json value is where we save that information.

Remember, when we were talking about having the image of the product to the left or to the right of the page, that data is set into the product.json, and this is how we know we're rendering this Liquid template, we know the state that we should use here. This is how we compose the state of the page, and this is how we know which Liquid files we should load. The template lists the proper sections, the proper blocks that are the smaller elements into our template that should load there. This is how everything gets rendered. The template, it has information about everything, about the layout, about the sections, and about the tiny blocks. Then the buyers can just have their page rendered. This is the entire lifecycle about how the state is loaded in a way that we have non-technical folks participating in the process of editing a template.

Back to our personas here. Who are we satisfying with all this architecture, with all these approaches and conventions of setting a theme state? We're satisfying merchants. They can update templates. We're satisfying the buyers because they can see the templates being rendered. The theme developers, they can write their Liquid and they can expose the properties they want to be exposed in the editor using those JSON structures. Currently, we're not satisfying the app developers. They're the folks that are missing. How can app developers participate in this process and render their own widgets into the page? Because it must be extensible. Themes, the developers should be able to build them. They should be customizable, but they should be extensible. Because, for example, you have your page here and maybe merchants want to install an app where they can show the discount or show an even more complex component there.

Then you might be thinking, what is an app? What's something that you can install into a theme? An app, in the end, is just another piece of Liquid code that has a schema, that has an interface. We can think about blocks and sections as pieces of Liquid code where you can expose schemas. There are these properties that can be set to using our editors, so non-technical folks can just go there, click into this plus button here, and then they can add, for example, their review component as we were seeing before. Then we can interact with this component here, and for example increase the number of stars. The number of stars is a property that the developer could expose using those settings. It's pretty much this. When we think about our request lifecycle in terms of state, this is everything that impacts the state of the page.

The URL impacts the state because the URL dictates the product that's going to be loaded based on the product handle into the URL. Then we have the global settings, they have the background information and all other global states that are important. Then to the template we had all the information about the sequence of elements that are appearing on the page, and the section and blocks, they are the elements themselves that are referenced by the template. When folks install an app, that information gets set into the template. The template is the most important element from the perspective of the state of the page.

Runtimes

We have Liquid, our DSL, so developers can write any kind of template. We have the schemas, which is a way that developers expose properties into their code, that are accessible via the editor, so non-technical folks can participate in that process. Everything must be super-fast because the BFCM is coming, and with BFCM we have millions of requests hitting our application. How can we make the rendering process fast here? An overview of the runtime that we have at Shopify. Shopify runs on Kubernetes on top of GCP, on Google Cloud platform, and we adopted this multi-tenant architecture. That means that we have isolated MySQL database, and we shard them based on the shop ID, so we have that kind of isolation. The databases are the hardest parts to scale because they are stateful, but the rest of our application is stateless, so we have a custom Kubernetes autoscaler that scales our application based on traffic. The database is really the most complicated part. The way that we handle databases versus rendering, zooming in a little bit into how each instance works at Shopify. We have these two important elements. We have core, which is the component that we use to persist state.

For example, when merchants interact with the editors and save colors or save Liquid templates, when we write information, core is the component that we rely on to write into the databases. Then we have active data replication, so when we render in storefronts, which is super heavy, we don't have an impact in our primary databases. This is an overview of how we organize our components. Focusing a little bit more in rendering. Rendering is quite an isolated component at Shopify, because it's the most heavy one. We have much more buyers than merchants in the end. The way that we have this isolated component, we're very strict about the way that we write Ruby there, and the things that we allow developers to write, and how we manage memory. It's a much more critical application from that perspective.

How can we scale this application? There are many techniques. Here we have an overview of our infrastructure that runs SFR, storefront renderer and core. One important technique that we adopted to optimize our applications is the usage of native extensions. This is not something that's specific to Ruby, like in Python we have the C extensions, in Java we have the Java native extensions. Native extensions are a common technique of high-level languages to optimize the execution, and make your code pretty much fast. The way that it works is you have a high-level language like Ruby, and in the middle of your Ruby code, you can call this native code that's written in C, or C++, or Rust. This is an overview of why to adopt native extensions. Some good reasons to adopt native extensions in your application, when you're building something that's very critical, it needs to run really fast. Some good reason is garbage collection.

For example, you have an algorithm, and you create a lot of instances of something, because you have this super complex algorithm, and for some reason you need to create a lot of instances. This may have a massive impact in garbage collection, because as soon as you create these instances, they need to be collected. One good thing about native extensions is that the code that you have there is not part of the regular code that you have in your runtime. When you create a string inside of a native extension, it's your responsibility to allocate that space in memory and clean it. You can be much more intentional there. Then you can avoid those pauses that garbage collection generates.

Another reason is, generally native code is faster than high-level language code. That's not always true. We're going to talk about this. This is another reason that maybe you can consider adopting native extensions in your high-level application. My favorite reason is reusability. If you already have a very good, for example, JSON parser in the industry, there's no reason for writing a new JSON parser in Ruby, to Python, or Java, just for the sake of having in that language. You can adopt an existing parser that's already written in C, for example, and just call it from your high-level language. These are some good reasons.

Just to provide a more clear overview about how native extensions work. We have our high-level application, and in Shopify's case, we have a Ruby application that runs the Ruby virtual machine. When you're running your code there, you load your Ruby files, you write your business logic, and at some point, for example, you need to do something that's very performance-critical. For example, parse and render a Liquid template. It's something that you know that you're going to do very frequently, and it's something that can be possibly optimized using the native extension. The way it works is that you don't write this code in Ruby. You write into a lower-level language, for example, Rust or C, and then you compile this code using a set of conventions.

The binary that you generate can be loaded into a Ruby program, and your Ruby functions can call this native code. This is only possible if you write your native code following a set of conventions, and you compile it following a set of conventions. You can reload it into your Ruby program and call these low-level functions, from the Ruby VM. This is pretty much the same process that we adopt in other high-level languages, like Python and Java, again. The way that you do this is by adopting this kind of FFI call. This is how you can call this code.

As you may notice, there is some complexity here. In the Ruby side, you have all conveniences that using a virtual machine provides you. You can create your instances knowing they will be garbage collected. You have the YJIT. You have optimization that runs together with your code and makes it faster, just because it's running there for a while. When you have your native code, you don't have those benefits. You need to allocate memory yourself. You need to clean memory yourself. It's a blessing that we're not creating new instances, creating new spaces of memory, and creating more work for the garbage collection. It's also a curse, because then we need to do it ourselves, and then you can have memory leaks and dangerous stuff. This is something that can bring more performance to your application, but it's also something that can bring more trouble to you as well.

That's the reason that every time I talk about native extensions, I like to show this comparison here. We're going to compare Ruby and Rust's performance. Generally, Shopify defaults to Rust when we need to write code that's going to run as a native extension. Here we have this compute1 method here. Compute1 receives some value and does some computation there. Let's say that this computation that we have here in the result is the core of your business. Then you say, I want the core of my business to be faster, so I'm going to extract this to a native extension like this. I'm going to call this compute from Rust, and Rust is going to run this faster. This is one thing. Then let's say that you get excited, and then you just extract everything to Rust.

Then you have this other scenario here. We have these three versions of the same code. The first one is purely Ruby, the second one is partially extracted to Rust, and the last one is fully in Rust. Which one is faster? The full Rust is the fastest one. The concern is that the second fastest one is pure Ruby, because in the second example, we have so many FFI calls that we have some overhead on doing this. Because every time that we call from the Ruby code and native code, we have some friction there. We have some cost. We need to, for example, convert a Ruby integer to a native integer, and vice versa. This marshalling and unmarshalling process, it has some cost. It's much better when you adopt a native extension in your application, that you don't do a lot of FFI calls, because they have a price in your application.

Tools

This is another phase that we have here. We have the Liquid templates. Developers can write whatever they want. We are very strict there, so we can render these templates really fast to render the stores. We have these schemas where merchants can interact with code, using our editors, and change their stores with the settings they want. We have our runtime that runs fast, and this common technique that now we know of using native extensions. That's why we can parse templates really fast. We have pretty much the entire ecosystem here, but maybe the developers, they're not writing the themes that you want. Maybe they're writing some Liquid code that's slow to render, or maybe they are not creating the schemas. They created their themes, but then the merchants, they don't have anything to change in the editor, because developers are just not adopting the schemas. This is where tools play an important role. Tools are the thing that help developers write the kind of code that we want them to write. The tools are the things that we use to make it easy to do the right thing.

For example, Liquid is a server-side rendered language. That means that the more complex your Liquid is, the larger will be your time to first byte, for example, in your application. How can developers optimize the time to first byte in their templates if they don't know how Liquid is rendered in our backend? This was the reason that we created this component here, this CLI command, the Shopify theme profile. When developers run this, we provide them a report showing how each part of the template renders, so they can spot the bottlenecks in their templates. This is something important to keep in mind. If you're building a theme system or any kind of other DSL where developers don't have access to the runtime, we are the ones responsible for providing this information to them.

Another kind of code that we wanted to minimize in the code base, it was the parsing block scripts. This is not good for performance, and then buyers, they have a bad experience because of parsing block scripts. We introduced a linter called Shopify Theme Check. Then when folks install our extension, they auto-get this kind of error in their code. If they hover, we teach them, and you say, you should avoid parsing block scripts in your template, so you can write better templates. Another common error that we noticed in the code bases was related to objects that are not even available there, but developers were referencing them, like market here. We created another lint for folks, like, you're using this object, but this object is actually not available in your template.

Another example here is, for example, templates with syntax errors. In the past, folks, they really needed to upload their file to know when they had an error, and with the linter, we were able to make the feedback look much faster for developers. Another massively important tool for folks when we create DSLs for them, is the language server. Every language that we use these days, every language has a language server. This is the kind of experience that we provide with our language server. For example, when folks interact with the product, as you may notice, we list all the available methods of that instance. This is the development experience that we provide with our language server. As you may notice, it has many levels. It really understands the context of the code and provides these context-aware completions for developers.

To make this possible, we needed to take an important decision. Our path that we use in our backend to render templates and to render the stores as fast as possible to buyers is focused on performance. It doesn't know about HTML, for example. It just knows about Liquid, and it just generates the output for that Liquid template. It doesn't know about HTML nodes. It doesn't know about anything. However, when we are building a tool for developers that are going to write HTML code, we really want to provide some extra clues for them.

For example, if they miss to close a div, for example, or another kind of HTML node. One difficult decision that we needed to make was writing a new parser for our tooling that is Liquid-aware and also HTML-aware. It can provide linting for both languages, which is different for a backend parser. We needed to build this error-tolerant parser. When you're building some tool for developers, it's important to notice that the code that they are writing is a work in progress. It's not finished. The code will likely be broken every time that you parse and try to understand it. For example, here we have a developer writing a for loop. This has a syntax error here. This for loop is not closed. Even though we need to be able to parse this template and understand that product is an instance of product. We are able to code complete valid product fields here for developers. The only way to make this possible was building an error-tolerant parser.

This is another thing, I'm not trying to discourage anyone from building DSLs, but if you're inventing a language, you need to invent all these other tools as well to support the developers. An approach that we adopted to build this error-tolerant parser was using this library here. It's an excellent library. If you're building a VS Code extension and you need to parse very easily some piece of code, you can use this library called OhmJS. It has an excellent editor as well. It's a parser generator. We created an error-tolerant parser using this tool here. With this, we're just able to parse Liquid in the frontend without writing a new parser by hand. It's a parser generator.

Another important thing to keep in mind is that our schemas, which is the tool that we adopt to expose properties in the editor to merchants, it's a JSON. It's not Liquid code. To make code completion here available and also to link to this JSON object, we also need to have a language server here. Then you're probably thinking, so you wrote an error-tolerant JSON parser. Then, my team, we didn't need to do this because we already have a JSON language server available that's maintained by Microsoft. In the end we just reuse the same JSON language server that Microsoft already built.

The way that this JSON language server works, is just instantiate this service parsing a JSON schema with the shape of the JSON you expect, then it works out of the box. This is one benefit of using a DSL that already exists. We already have a language server here. It was much more straightforward to implement than our Liquid language server, which you needed to build from scratch. With this, developers can also more easily write these JSON objects that in the past were much more challenging to write because the feedback loop was much larger. They didn't have any linter or code completion.

Your Domain

With this, we close the loop on the key elements of the Shopify theme system. We have Liquid as a primitive, the schemas as the bridge between technical and non-technical folks. The runtime, where we adopt some techniques to have a healthy runtime, but the main one that I would consider as something to bring home is the use of native extensions. It's widely used, but it's not so widely used in our day-to-day, but it's widely used when you think about libraries we use in our day-to-day. Most of JSON parsers, they rely on native extensions. Also, the tools. As soon as you create your ecosystem, the tools will be the thing where you're going to make it easy for the developers that you are inviting to your ecosystem, to write the code that you expect them to write. The tools have this cultural impact.

Then you have your own domain. Now you think about this and you can think about your application and think like, which one of these building blocks can I adopt in my application to make it more customizable? If I needed to select the most important one, it should be the schemas, because the schemas is where we establish bridges with non-technical folks. Also with the schemas is the tool that we use to define the most granular elements in our interface. When you think, for example, about VS Code. When we change our settings on VS Code, our background, we're setting global settings there all the time.

The important and cool thing about schemas is that you're exposing properties about these building blocks and you're inviting merchants to set those building blocks into the screen, following the order they want and with the settings they want. What's the most granular element in your application? We can create this kind of interface and invite the non-technical folks to set some values there. Then with this, you can make your domain more editable. I hope you make your applications more customizable as well.

See more presentations with transcripts

Recorded at:

Jan 26, 2026

Guilherme Carreiro

InfoQ Software Architects' Newsletter

Theme Systems at Scale: How to Build Highly Customizable Software

Summary

Bio

About the conference

Transcript

Background

Theme

Liquid - Why Anther DSL?

Schemas

Runtimes

Tools

Your Domain

Related Sponsors

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Popular across InfoQ