Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Rendering Large Models in the Browser in Real-Time

Rendering Large Models in the Browser in Real-Time



Shwetha Nagaraja & Federico Rocha explore in detail some of the most interesting heavily-optimized techniques and strategies that Autodesk Forge Viewer introduces for viewing extremely large 2D and 3D models in the browser in real-time. These include progressive rendering to reduce time to first pixel, geometry consolidation to reduce draw overhead, etc.


Shwetha Nagaraja is a software engineer at Autodesk. Federico Rocha is a senior software engineer at Autodesk.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Nagaraja: The world we live in is very complex. Today, we have 7.5 billion people living in this world. In 2050, it's going to be 10 billion people. All these people are going to need houses and hospitals, cars and cell phones, and a variety of other products and infrastructure. Not only do we need a lot of things, but these things are getting more and more complex as technology advances. All these things need to be designed and built.

Consider the complexity of the building, or a car, or even something as small as a pair of headphones. All these things consist of a large number of specialized parts that need to be designed and built. Let's dig deeper into how a building is designed and constructed. Most buildings are designed on a desktop computer in an office somewhere, potentially far from the actual construction site. In the recent past, people would flatten and print these designs on paper and share this paper with the actual construction workers. But paper is fragile, heavy, and has to be carried around. Later, workers started using mobile devices like iPads, tablets, and smartphones, to replace paper. Having the 2D images on these devices, ensured that they were always at their fingertips, but static 2D images are not the best at conveying sophisticated 3D structures. They have limited ability to show the most relevant information for a given task. Furthermore, designs can change, even while they're being built.

What we really want is for the design changes to be reflected immediately and for the information to be viewable in both 2D and 3D, and be able to be visualized in different ways. None of these buildings are built alone, and the viewing tool with the properties just described, could facilitate collaboration. Wouldn't it be great if this viewing technology could be implemented using standards of technologies and accessible in any web browser? This is what I work on.

I'm Shwetha Nagaraja. At Autodesk, I develop tools and services that help people to view, share, and collaborate on their designs in real time. Our goal is to create a portable model viewer for the fields of architecture, engineering, manufacturing, and construction. You might be thinking, "Why not use special software?" It wouldn’t be practical to install complex specialized expensive design and engineering software in every mobile device that will be used for viewing and collaborating on the design in the field.

You might be thinking, "Why not use cloud-based rendering?" For several reasons. One, internet connection is not always available everywhere. Transferring images between server and the client has latency. Cloud computing could cost a considerable amount of money. Modern mobile devices are actually quite powerful. As of today, technologies WebGL and Three.js have 98% smartphones, tablet, and desktop support. We can leverage the mobile devices themselves to do the rendering for us.

The solution that we have come to, is to render on the device itself in the browser in real time. The particular viewer that I work on is called Forge Viewer. When WebGL was initially released in 2011, we quickly recognized that it was perfect for our use case. We started building Forge Viewer in 2013. The first stable release of WebGL wasn't until 2017, so we were actually quite early adopters. Today I'll share with you some of the technical challenges we have had, and the things we've learned during these several years of development. In particular, we'll be discussing some interesting optimization techniques that address the challenges noted here, such as rendering the data at interactive rates, using memory efficiently, and so on.

File Format

A typical design can contain more than one gigabyte of data. In order to solve the challenge of making the geometry data available for viewing as soon as possible, we have designed a file format that contains spatial metadata that the viewer can use to load the file in the most efficient way. Our file format divides the geometry into chunks, and preferences them by a header containing a description of the spatial layout of these chunks. The viewer requests chunks of geometry from the file in order of importance, based on their visibility, their estimated size on the screen. This allows the viewer to render the most important chunks onto the screen while the file is still loading. Furthermore, the geometric data is laid out such that the GPU can use it directly.

Here's a schematic view of the file containing a header in orange and tensions of geometry in blue. Naturally we load the header first that's lined up green to signify that it's been loaded. The header tells us how the geometry chunks are laid out in space, and the approximate size of their bounding boxes. Here, we visualize that layout. Now, let's show the camera through which we will view the geometry inside Forge Viewer. Based on this camera view, they can determine a good order of loading and rendering the geometry chunks. Only the chunks within the camera's field of view, represented here by yellow lines, will be rendered. We'll start by loading and rendering the chunk that will appear the largest on the screen, followed by the next largest and closest on the screen, and the next, and will continue this process until all of the visible geometry has been loaded and rendered.

The next challenge is how to render this data at interactive rates. I'm going to quickly review how traditional rendering works for those of you who are less familiar with computer graphics concepts. Traditionally, everything on the screen is rendered every frame. Usually, each frame is 1/60th of a second. In other words, you have to draw everything on the screen in less than 17 milliseconds. This approach works well for games, because in games you can tailor the scene complexity, if necessary, to fit within the 17-millisecond frame budget. But in our case, we have to render anything and everything that comes in the design file. Because the file is representing complex real-world geometry, it is likely to have far more data than we can really render in 17 milliseconds.

A building could easily be composed of millions of measures and triangles, and could take minutes to load and several seconds to render. Here you see an example where the frame time fits within the frame budget. In other words, we are able to render all the objects in one 1/60th of a second. On the bottom, you see an example where there are too many objects, and the frame time does not fit within the frame budget. This means the frame takes more than 1/60th of a second to render, and the frame rate drops below 60 frames per second, causing choppy motion. All the objects are rendered again on the second frame.

Let's look at how interaction suffers if the frame time is more than the frame budget. If the user interacts at some point during the first frame, the interaction isn't handled until after the first frame is done, causing lag. The user might even attempt to interact multiple times before the first interaction is handled. In order to solve these problems, we implemented something called progressive rendering.

Progressive Rendering

Progressive rendering saves the state of the first image and reuses it in this next frame, picking up where it left off. If the model is simple, this approach produces an image that is indistinguishable from the image produced by traditional rendering because everything fits in the frame budget. Where the model is very heavy, this approach allows more frequent updates without lag. Let's see how interactivity is better than it was in traditional rendering.

As you can see here, we are able to finish each frame in 1/60th of a second. If the user interacts at some point during the first frame, the interaction is handled right after the first frame is done, resulting in smooth navigation. The render starts over from scratch after the interaction, but that's fine for our use case. Here are videos of the viewer comparing progressive rendering on the left with traditional rendering on the right. The first image is shown sooner in the progressive rendering. As you can see, the progressive rendering version is much more responsive. In the non-progressive version on the right, I have a hard time zooming the camera. Here's another video comparison. This time, I have a hard time retaining the camera in the non-progressive version on the right.

Typed Flat Arrays

We can only use a fraction of the machine's memory. For us, optimizing this memory is very important. Tabbed flat arrays help us to do this. Tabbed flat arrays consist of a fixed size buffer of arbitrary binary data, plus one or more views for interpreting it. The buffer can be read from a binary file or created in JavaScript itself. The views are decoupled from this underlying data buffer and provide interfaces into this buffer. Tabbed flat arrays are similar to arrays and CR buffers in GPU programming. While objects are easier to use in most cases, tabbed flat arrays have their own advantages. Using tabbed flat arrays instead of objects reduces lookup time by simply indexing into memory instead of potentially doing hash table lookups. It also reduces memory allocation and copying overhead, and it reduces memory usage by storing data compactly in a contiguous block of memory.

Optimizing memory usage can be particularly important when working with large models, because browsers impose per tab or per website limits on how much data can be stored in the JavaScript heap. In Chrome on desktop, our websites are currently limited to allocating about two gigabytes. On the mobile, that limit is much lower and not well-documented. If we would try to allocate our big models in memory without special care, we could easily exceed those limits. Luckily, most of our data is comprised of numbers. In general, all numbers in JavaScript are represented using 64-bit double precision floating point numbers. Tabbed flat arrays allow us to use smaller number types instead. For example, 32-bit floats are sufficient for matrix elements and work exposition components.

Here's an example of what JavaScript tabbed arrays look like. This example shows a few different types of views into the same buffer. In this code, first we create an untabbed buffer with the specified number of bytes. Then we create a view, view1, that we use to write a 16-bit unsigned integer to the buffer. We could use the same buffer to read data as well, but for this example, we create a second view, view2, and use that to read data from the same buffer. Here's my colleague, Federico Rocha, who will tell you about how we use tabbed flat arrays in Forge Viewer, and will discuss a couple more optimizations.

Rocha: I'll talk a little about why we use flat arrays in the viewer. Before we dig into how we used tabbed flat arrays in the Forge Viewer, let's look at this example and calculate a rough estimate of how much memory we can save. Here, we can see a diagram of a possible four by four matrix of a JavaScript object in memory, and how much it occupies in the heap.

This kind of matrix is useful to represent transformations in 3D space. Each element in the matrix is a regular JavaScript number, so it occupies 8 bytes. These metrics have a reference to the array of elements. Each reference in JavaScript takes 8 bytes of memory, making the whole object 166 bytes in size. Also, internal representation of objects in JavaScript have additional references. In the Google BA build a machine, for example, each option has a reference to each hidden class and all of the structures. If we consider the hidden class of the matrix and the hidden class of the array, we arrive at a total of 152 bytes.

Let's see the case with type arrays. In our case, single proposition is enough for our transformations. If we start these matrices in a single flow type array, they occupy just 64 bytes. Getting rid of the references and cutting our matrix elements in half results in savings of around 88% of the original space. The snippet of code shows how we would trade with the particular metrics. If we want the matrix in position N, then we multiply N by the entries that the structure occupies in the array. In this case, it's 16 entries.

We use this approach to restore vertices, indices, mesh IDs, transformations, bounding boxes, instantries of potentially millions of meshes. We save a lot of memory. This approach not only works for homogeneous data. For example, meshes can have vertices with wildly different attributes and layouts. We use just a big array buffer to store them all. Here we have two meshes, each mesh with different vertex layout. The field one has a position of normal at exterior coordinates, and their castles of attention and detentions. To store them, we create another buffer and similar type of arrays to access it. We need the installation of different elements of the mesh, we use the correct type array to set their values. In this example, I can see the position using a float 32 type array. I can see the texture of coordinates, using anti-sure 16 type array.

Another advantage of using array buffers come from the communication between main thread and web workers. Web workers is the way JavaScript allows code to run in different threads. Array buffers make the passage of information between workers and my thread very fast. One example of this can be seen in this diagram. We use web workers in order to load and parse our design fonts, so we can create images in all the structures. Geometry is loaded and assembled in the background and passed through the main thread when it's ready. This communication between web workers and our main thread comes from free using array buffers.

Web workers cannot share regular memory between them and the main thread. The typical way to pass data between workers is with messages, where passing a regular object, they got cloned. They are serialized, copied, and they're serialized on the other side. These three operations, serialization, copy, and the standardization, could be costly depending on the complexity of the data and its size.

Without a buffer, things are different. In JavaScript, there is an interface called transferable, a few objects implemented, RA buffers, message port, image, bitmap and off-screen cameras. This interface has no method and has only 12 checks to indicate to the browser that this object can be transferred between execution context without copying. They live in shared memory. Transferring a large array of buffers between thread is very cheap.

Data Alignament

Compacting our memory usage for efficiency is good, but we have to be careful about data alignment. Models process this memory in chunks of typically of 32 and 64 bytes. For example, if your computer has an architecture of 32 bytes, it means that it's written by memory in Windows of boundaries of 4 bytes. If your data is misaligned with those boundaries, memory access for processors will be slower. JavaScript does a great job to not allow you to misalign your data but there is at least one place where you can mis-align it. For GPUs, the smaller you can make your vertices, the more chances they stay longer in the cache, making rendering faster. That being said, it's always good idea to keep your attributes of your vertices for a better line, especially in mobile. Let's see an example.

Here we have a vertex with position, normal, and textured coordinates. Each component is four bytes long. All attributes are aligned with a 32-bit architecture. One memory optimization we can do is just enormous using shots as components, but in this case, texture coordinates get misaligned. In other words, the address of the texture coordinate doesn't start at a 4-byte boundary. One possible solution for this is adding padding to the normal. One extra shot of padding will make the texture coordinate align again. There is also another solution, taking advantage of the fact that the length of the vector has to be one, the normal vector has to be one, it's possible to calculate the three components C from the other two in the shader. It's possible to remove an extra shot from the normal and only store two of its components, making the texture coordinates align again.

Geometry Consolidation

Now let's talk about another optimization we made. This time it’s about draw calls. Every time a group of vertices is sent to the render or to the GPU, we call it a draw call. Draw calls are expensive because the drawer has to set up a lot of internal states and make a lot of validations before drawing. To achieve high performance as developers, we need to think of clever ways to reduce the number of draw calls. Consolidated geometry, making a big mesh from several smaller ones, is a way to achieve that.

Let's see a simplified example of what we mean by geometry consolidation. In this diagram, we can see a bunch of meshes. Each one has a shape, a material color, and a transformation that places into the world. In order to render each mesh, we have to make a different draw call. Think of a draw call like a function that receives as parameters a material with each color, a transformation, and the geometry in the form of an object, in the vertex buffer and a primitive counter render. Every time we need to specify different values for those attributes, we need to create a different render goal.

The first thing we can do is to unify the materials each matrix uses to render. In this case, we can make the material to take the color from the matrix data, instead of being a property of the material itself. Now, our geometry common render uses just one material. The second thing we can do is to re-multiply the vertices by the word transfer matrix and pull the result back into the vertex positions. Now, it doesn't need to indicate a different transformation for mesh, because all vertices in the mesh are already transformed. Rendering using the identity matrix will make the same result as before.

Finally, we can rearrange the vertex buffer, so all consolidated measures are placed continuously without gaps or other geometries in between. As a result, now it is possible to make one draw call using one offset and size into the metrics buffer. One material and one transformation, data entity. Reducing draw calls is awesome but we cannot go overboard with it. Generally, as we consolidate geometry, we get bigger and bigger bounding boxes with more and more empty space inside. That can mess up occlusion Colin algorithms. Also, the more vertices we render per draw call, the higher the chances of messing up progress in rendering. Each goal takes substantially more time. It's more difficult to cut off the rendering frame before the budget timeout. We have some logistics about what measures to consolidate. We prioritize combining small measures of a few hundred vertices. In a building, you can think about lightbulbs, tubes, cables, windows, door knobs, chairs, and so on. Now, I hand it back to Shwetha [Nagaraja] to talk about non-photorealistic rendering.

Non-Photorealistic Rendering

Nagaraja: Until now, we've talked about technical optimizations, but sometimes there are non-technical optimizations that can really help us too. Sometimes it's better to change our assumptions and simplify the problem, than come up with marginally better solutions to a more complicated problem. In the beginning of the project, we try to make things more and more photorealistic. However, in photorealistic graphic style, shapes were not communicated properly with all the shadows, highlights and crazy materials.

This is an example of a more photorealistic rendering in the Forge Viewer. Here's an example of a building that has basic photorealistic renderings. And here is a non-photorealistic render of the same model. In this version, shapes are more visible, parts are outlined more clearly. Colors are more distinct, and there are no distracting specular reflections.

In many design representations, there's often an emphasis. Sometimes, color is used to call attention to a certain design element. Sometimes, only part of a design is displayed in detail, and the rest left schematic. Transparency, cutaways, and other illustrator visualization styles are common. In the real-world case, designers often imply less photorealistic modes of expression, precisely because they don't want to give their audience the impression that a design is more final than it is. Often, clients will misinterpret conceptual designs if they look too real.

In addition to conveying design information more clearly, non-photorealistic rendering is usually faster as well, due to using simpler material and lighting systems. Our goals are data visualization and interactivity, not just making things look realistic.

Future Improvements

While we have made many improvements already, there's always more to do. One notable example of an upcoming improvement is related to instancing. Currently, we support instancing, which allows us to load and store identical models only once, but draw them to the screen multiple times. The modeling software and the user are responsible for authoring the data in a specific way and providing these instances to the viewer. In the future, we will automatically identify identical objects and shapes before they're presented to the viewer, and automatically encode them as multiple instances of the same object. We will even be able to identify and de-duplicate shapes that have different orientations, scales, and placements in the same. This will significantly reduce file size, bandwidth, load times, and memory usage.

Today we talked about a number of different optimization techniques. If you're making web graphics applications or something similar, you might be able to directly use the optimizations we've talked about. Even if not, maybe you will be able to use some similar approaches but adjusted for your own use cases. All of these optimizations enable people to more effectively design, share, view, and build complex structures that are so ubiquitous and important in the modern world. Feel free to reach out to us at these email IDs, if you think of any questions later. If you're interested in more general information, you can visit our website,

Questions & Answers

Participant 1: You mentioned transferables. I was wondering if there's an alternative to that, because it looks like that's been deprecated, I think a while ago. I was just wondering if there's anything you could recommend instead of that for the transferring into the data across the array buffers.

Rocha: For the preparation of this talk, I read the same thing, that they are being deprecated, but I didn't find any alternative. I suppose array buffers will still be transferable, even if it's another mechanism. There is a new thing is called shareable array buffers, but we'll see.

Participant 2: You mentioned that you're using different bounding boxes for different pieces of the mesh, and continued to use them even after you put the mesh together. What's the cost of that and how do you implement it in general?

Rocha: We just do the union of the bounding boxes, so if you have two meshes with two bounding boxes, it will be the union. That is what gives it more space in between and less useful for cooling.

Participant 2: Right, but it also requires less memory, versus your approach where you put everything together and now you have multiple bounding boxes.

Rocha: I don't know if we saved so much in those bounding boxes. The gain in this case is mostly about render code, so we use a little more memory if we can reduce that. One of the reasons we do this is we try to consolidate small meshes of a few vertices, instead of trying to consolidate huge ones, so that we can have a buffer. We can reduce the number of draw calls greatly and not use too much extra memory.

Participant 2: But do you use a separate buffer then, because now you need to distinguish which pieces of bounding boxes are within your single mesh rather than draw calls?

Rocha: Yes. In this case, we create another one.

Participant 2: You have a separate array to distinguish...?

Rocha: Yes. That array is calculated by the worker and is passed to the original.

Participant 3: What's the largest model you've been able to render using Forge?

Nagaraja: Actually, the model that you just saw, I think that one was about 1GB, just a little more than 1GB. Our customers use it for even more than 1GB, but we usually want testing. I tried to limit to 1GB.

Rocha: They managed to crash the browser no matter what we do. They always export. The problem is bounded; designs get bigger and bigger and it's always a constant fight to try to optimize.

Participant 4: You're trying to reduce RAM memory usage because it's limited. Do you try to use a GPU sometimes instead of RAM memory? Do you unload something when it's already on GPU memory, or you usually have copies in RAM and in GPU?

Rocha: We have different approaches, but one is to try to discard things that are not in the camera. I don't know if it's implemented yet, but there is another mechanism that you can store geometry in your local storage and get it back, like some sort of pagination. I know that they’re looking into these kinds of things.


See more presentations with transcripts

Recorded at:

Aug 21, 2019