InfoQ Homepage Presentations From Runtime Efficiency to Carbon Efficiency

From Runtime Efficiency to Carbon Efficiency

Bookmarks

View Presentation

Speed:

31:07

Summary

Michal Dorko discusses Goldman Sachs’s proprietary language, Slang, a core technology responsible for booking trades, quoting prices and analysing risk, among other use cases.

Bio

Michal Dorko is a software engineer at Goldman Sachs working on the in-house proprietary programming language and runtime. His focus is on improving the maintainability and performance of the language.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Dorko: My name is Michal Dorko. I work as a software engineer for Goldman Sachs. I am part of the internal SecDb architecture team. We own an internal programming language called Slang. Slang will be our main topic that we'll be discussing. Before we dive deeper into Slang language, let me introduce you to our SecDb ecosystem. SecDb is an ecosystem within Goldman Sachs which consists of object-oriented database technology, and data synchronizing mechanism. Also, the other core components of the SecDb are a language which we'll be discussing called Slang, which is a scripting language. Integrated development environment for Slang called SecView, various front office processes such as trade booking system and trade model risk applications, quoting applications, and various integrated controls which are required by our regulators.

What is Slang? (Security Language)

What is Slang? Before we dive deeper into Slang, let's first address a question which comes to your mind is, why do we have our own programming language? Slang was developed and designed early in the '90s for financial modeling, before other popular scripting languages, such as Python were widely available. It was designed for mathematicians, financial modelers, with no computer science background to allow them to easily develop, implement, and deploy their models into production. It is based on C, because in those days, when Slang was created, C was the main language used for financial modeling. Therefore, creators of Slang wanted this language to be as similar and as familiar as C to our financial modelers who were using it. Slang features can be split into two main sections. Slang is a general-purpose language, but it has features that are specific to a language used for financial modeling in a financial institution. The general-purpose scripting language features are working with files, network connections, regular expressions, general data structures, and many others. Features which are quite specific to Slang would include tight coupling and integration with the SecDb database for accessing and analyzing SecDb objects and various built-in functionalities for performing quantitative calculations and creating financial models.

Slang Language Features

Slang is a case insensitive scripting language. It is dynamically typed, has a built-in graph framework for financial modeling for expressing dependencies between financial instruments. It has a built-in rapid deployment SDLC process, various built-in controls which are designed for financial institutions and are required by our regulators. Other very important features for this talk that I'd like to mention and are important to understand why we have been redesigning our runtime and why we have decided to write our own virtual machine, is that interpreter, datatypes, and all the built-in functions are all written in C++, and everything is written, implemented as a function. Slang has no keywords. Maybe a small interesting thing about Slang frontend is that Slang variables and functions can contain spaces. As I mentioned, all of the datatypes in Slang are implemented in C++. Our internal representation for this datatype, data structure holders are DT_VALUEs. DT_VALUE is what is known as a discriminated union. It's a data structure which allows holding different datatypes. At each point during the runtime, this discriminated union, in our case, internal name DT_VALUE knows about what kind of data it's holding. This information is stored in a member datatype as you can see here. Actual data is stored in a C style union, where we have a number. If the DT_VALUE contains number, it's directly stored on the DT_VALUE for optimization purposes. Any other datatypes are storing simply pointers to more complex datatypes, which are allocated on the heap.

Slang in Numbers - How Much Slang Do We Run?

Let's talk about the scale that we operate on when we talk about Slang. Currently, we have 200 million lines of code in production. We have more than half a million scripts. We have more than 4000 active developers on a daily basis. We spend more than 300 million compute hours per week running processes which are written in Slang. Slang itself is quite a complex language that has been evolving over the years since early '90s. As at this point in time, we have several hundred built-in datatypes, and more than 10,000 built-in functions.

Slang SDLC (Rapid Prod Deployment)

I would like to briefly talk about SDLC for Slang, which is quite unique and allows users and developers to deploy their changes to production very quickly. Slang scripts themselves are stored as binary objects in a database. The processes which execute Slang connect to a database, load their scripts from database, and execute them locally. For this purpose, our SDLC has been also designed in a way where our user, developers of the Slang write their changes in what we call user database. It's a development area. All the testing and implementation is done in the development user databases. Once a developer is happy with their change, they submit it through a review procedure using our internal review toolings. After appropriate approvals are granted, the changes are then committed to version control system which is used mainly for audit purposes. After the change is committed, it is then pushed into a production database, which is then replicated across the globe. After being pushed to production, the replication happens within a few seconds. Therefore, the change is then available for other developers, for processes which are just starting or restarting to pick up. This allows us to quickly fix any production issues, address any bugs, fixes, but also simply iterate on our software that we are writing.

Current Slang Runtime (Tree-Walker Interpreter)

Let's now talk about current Slang Runtime which is implemented as a tree-walker interpreter. The current Slang runtime is the tree-walker interpreter. Slang source code is parsed into an abstract syntax tree which is then directly executed, and each node in the tree knows how to evaluate itself and knows about its children. This is our internal representation of a SLANG_NODE. It's a C struct. As I mentioned, Slang was designed in the early '90s, and still has a lot of artifacts from early C days. As you can see, each SLANG_NODE knows what is the type of this node. It knows enum about how many children it has. There is a function pointer which is the implementation of the node which is essentially executed and handles the implementation and execution logic of the node. Then it has all the actual children of the nodes. We also store error info, which is essentially our source information such as script name, line number, where this abstract syntax tree node has been present in a script and when we were parsing a script.

Let's take this simple example. We see at the top, variable var equals x plus y. This simple example is parsed into the abstract syntax tree which we see below. In our current tree-walker interpreter, we essentially start at the top level, which is assignment operator, then would walk to the left and execute the variable, which has appropriate FnVariable function that handles creation of local variable. The next step in the interpreter would be to step to the binary operator for addition operator plus. Again, operator plus takes two arguments, it's left-hand side and running some operand which are represented as children to this node, and so, the interpreter will then go to the node on the left-hand side which is for variable x, evaluate it. It will then go to the variable y, so the right-hand side node with the plus operator, evaluate it. Once these nodes are evaluated, interpreter will return the result of the evaluation back to the parent node, in this case, binary operator plus the plus operator. The function for the binary operator will process the results of the children, apply any logic which is applicable to binary operator and propagate the result back to the operator assign, which also takes the result from a local variable creation, and then interpreter will finish interpreting this simple expression at the top level.

Previous Attempts (Failure is Simply the Opportunity to Try Again)

What have we tried previously to improve our runtime? So far, we have tried few things. We've tried to lower our runtime into lisp. We tried CSlang, which is our codename for compiled Slang. It was an attempt to directly compile it and emit assembly instructions from Slang. Our recent attempts were in the space of TruffleSlang, so it was hosting Slang on GraalVM via Truffle framework. However, the challenge we faced is that, as we discussed earlier, there are 200 million lines of Slang, there are several hundred built-in datatypes, tens of thousands built-in functions which have no real specification, don't follow any standard, and they have been developed over 25 years of history of Slang. This makes it impossible for us to do a big bang migration to alternative runtime. The main reason why we failed to migrate it and adopt alternative runtimes such as GraalVM was that the hosting on Graal becomes prohibitively complicated and expensive, mainly when it comes to boundary crossings. The JVM and GraalVM has a very good C interoperability. As we mentioned, most of the Slang runtime, all of the datatypes, all of the add-ins, all of the functionality in the current interpreter are built in C++. GraalVM doesn't have a good interoperability with C++ due to things such as virtual tables.

Why Can't We Use LLVM? (Universal Solution to All Our Problems)

Another obvious question is, why can't we use LLVM? There is a fundamental mismatch between the strengths of LLVM which is targeting statically typed languages, but Slang is a very extremely dynamic language. The types in Slang can be defined at runtime, their behavior can be modified in runtime. Every time we are operating on any variable in Slang, we need to dispatch calls via implementation of its datatypes. This makes it very difficult for us to merge into LLVM semantics. It's the same reason why other open source scripting languages like Python, Ruby, and JavaScript, which are a similar dynamic in nature to Slang, don't use LLVM for their runtime.

SlangVM (Semantic Bytecode)

This brings us to the SlangVM, our internal virtual machine we've been working on. SlangVM is implemented as a stack-based bytecode interpreter. It shares few aspects of implementation like current tree-walker interpreter, we share the type system, variable representation, and Slang stack frame representation. The compilation is a purely additive step. SlangVM and compiler does not discard, destroy, modify the current abstract syntax tree. As a result of that, we can gracefully fall back to tree-walker interpreter, and we are unable to compile and evaluate expressions and code within the SlangVM runtime. This is how our current compilation pipeline looks like. We have a Slang source code which is parsed by a Slang parser into abstract syntax tree, which can be directly interpreted and executed by a tree-walker interpreter. In addition to that, we have a Slang VM compiler, which emits SlangVM compatible bytecode. This bytecode is then installed into the virtual machine that executes our bytecode. In cases where we are unable to compile the current abstract syntax tree, or we are unable to execute the bytecode, we have a simple graceful fallback where VM calls back into tree-walker interpreter, and tree-walker interpreter executes the abstract syntax tree, and hence execution then back to virtual machine.

SlangVM operates on the bytecode. The bytecode is represented as an array of bytes. Bytes are laid out in memory as a series of instructions by zero or more arguments. Each opcode has its own argument handling. We support few datatypes natively, such as integral types and opcodes, various addresses and jump offsets, and constant indexes. Any other datatypes are stored in a constant pool, that constant index points to. Now let's have a look at a few examples on the right-hand side. For example, OP_ADD, it's an opcode for performing addition. It's a single opcode which takes no arguments. It takes a space of 8 bits or 1 byte. Because OP_ADD operates on a stack, it loads two values from the top of the stack and adds them. It doesn't take any argument. Therefore, in the memory, it will be immediately followed by another opcode. If you take, for example, another opcode, OP_JUMP_SHORT for performing short jumps, this opcode has one argument, so the memory layout would be, we would have 1 byte, 8 bits for OP_JUMP_SHORT, which will be then followed by another 1 byte, and that would be argument to the jump offset. Therefore, interpreter would first read the opcode, which it would interpret as OP_JUMP, and it would automatically know that this opcode has one argument associated with, so therefore VM would read another 1 byte and interpret that as the offset for the opcode, which is the jump offset, essentially representing how much we want to jump in our bytecode.

Let's take a look at our example, which we've seen previously, our variable var equals x plus y. If we take this expression and compile it into SlangVM bytecode, we will be producing bytecode, which you can see on the left column. This is how they would be laid out in memory. Our first opcode is an opcode for reading local variable. It consists of the opcode and one argument. Another opcode is OP_READ_VARIABLE for reading the actual value of the variable that's pushed onto the stack that consists only of the opcode as it operates on the stack, doesn't take any argument. Therefore, it will be immediately followed by our OP_READ_LOCAL_VARIABLE, again, takes two arguments, its opcode and an index to a local variable structure, which is then followed again by a single opcode for reading variable OP_ADD. Again, as we've seen in the previous example, doesn't take any arguments as it operates on the stack so it's a single opcode. Then we have remaining three opcodes out of which OP_ENSURE_LOCAL_VARIABLE is again opcode which takes a single argument. It's represented as an opcode followed by single arguments. The remaining two instructions are single opcodes, they don't take any arguments. This would be the layout in memory in a bytecode stream. For this example, the main benefit we have over the tree-walker interpreter is that the control flow never leaves the main interpreter loop. The main benefit we gain from the simple compilation into bytecode and locating the [inaudible 00:19:34] in bytecode is the compact representation and improved locality, so we benefit from CPU caching on modern CPU architectures.

Value Stack

SlangVM is a stack-based virtual machine. The value stack is a fixed size array, and most of the stack manipulation is done as a side effect of an operation. We store DT_VALUEs, our discriminated unions, which store all our Slang internal datatypes, directly on our stack, therefore, the SlangVM can operate directly on the DT_VALUEs without any further translation and serialization or deserialization. Let's take a look at another example. Now we have a little bit more complex expression, we have x equals 42. Then we have a variable y, which is x plus 1. When we take these two simple statements, they will be compiled into bytecode, which we can see on a screen. Now we're going to simulate the execution of the VM and the state of the stack. For the start read, when we execute the OP_READ_CONSTANT, we simply read the constant value of 42, and pushes this value onto the value stack. Next instruction reads the local variable. It ensures that the variable exists. If it doesn't exist, then pushes the reference of like a pointer to a variable onto the stack. Next opcode, OP_ASSIGN_VAR, consumes the values from the stack and performs the assignments, so takes the variable x, takes the value 42, assigns 42 to the value of x and returns the value of this evaluation, which is the value 42, and pushes it back onto the stack. All expressions in Slang return a value, therefore, even assignment to a variable returns the value back. Therefore, we need to push back 42 onto the stack. Because we don't do anything with this value, we then simply pop it off to the stack.

Next, we are on the second line of code where the first instruction corresponding to that line of code reads the local variable x and pushes it onto the stack. Now we need to read the actual value of that variable. Then the interpreter will look at the value of the variable and then push the value of the variable which is 42 that we assigned on the previous line of code. Then the next opcode reads the constant, this corresponds to the value 1. It simply reads the value 1 and pushes on to the stack, so now we have value 1 and value 42 on our stack. Now our OP_ADD reads two top values from the stack and performs addition, then pushes the value of the result of this addition back onto the stack. Now at this point, the value of the state of our stack looks like this, where we have a value 43 at the top of our stack. Now, we are going to perform the assignment of this addition. The next opcode for ensuring local variable simply again creates if the local variable doesn't exist. If it exists, we simply push it onto the stack. Then we perform the assignment, so we take the variable y and variable 42 off the stack, and then put a result onto the stack. This is the value of the stack, and then the IMPLICIT_RETURN simply is everything in Slang, as I mentioned, has to return value. Then we compile this small expression, we implicitly return the last value, because we don't do anything with it, we simply pop it off of the value stack.

Virtual Machine

The main core part of our virtual machine is a loop, which simply goes through the bytecode, reads opcode by opcode, and operates on this opcode. We have a huge if switch statement, it essentially switches on each opcode and we have handling for each opcode. The main challenge we were facing is to integrate this with our tooling. As I mentioned, we have integrated development environment for Slang called SecView. SecView has a built-in debugger, as well as profiler. The main challenge we've been facing with migrating to SlangVM is to make sure that we are able to support all the debugging tooling, and all of the profiling and metrics that current tooling supports. The main challenge for addition to the interpreter was to implement profiling and debugging. As you can see, on the screen, it can be simplified as a pseudocode. We first read the opcode. Then if we are profiling, we need to collect profiling data. If we're debugging, we need to go trigger the debugger, handle user interaction with the debugger, and then we interpret the opcode. If you're in a debugging loop, you read each time, same for profiling. To signify how important and how complex the integration with existing tooling was, we have been investing heavily over the past few months. We spent more than half a year integrating SlangVM with the tooling, making sure we support all the debugging features which our user, our developers are familiar with. Make sure that the profiling is also providing correct data, presentable to our users in a format which they are familiar with.

Future Enhancements

Our goal for future enhancements in this space would be persisting a bytecode to a database. As I mentioned, currently, we store the scripts as binary data in a database, and the interpreter process loads the script from a database and interprets it. The goal would be to store actual compiled bytecode or some form of intermediate representation in the database, and it will allow us to remove the compilation and parsing step which would improve the performance as the client will load the already compiled bytecode from a database, and it will execute it directly. Another feature which we are planning to implement in the near future is implement a proper control-flow graph to enable us perform optimizations at compile time. We would also like to consider and implement optimizations such as elimination of unreachable code, compression of the bytecode, and constant folding and propagation. Ultimately, what we would like to target is compile our semantic bytecode, our SlangVM bytecode to a lower-level language, such as directly support the x86 assembly.

Benchmarks

How do we ensure that our SlangVM is faster than the current tree-walker interpreter? We're running a series of benchmarks consisting of three primitive benchmarks, then we have few microbenchmarks such as Spectral Norm, [inaudible 00:27:58], Merge Sort, purely written in Slang. Factorial, Fibonacci, Is Prime, and few other microbenchmarks, in addition to running examples of real-world pricing application and financial models from our users to ensure that SlangVM outperforms our current tree-walker interpreter in the current version. Then we will be ensuring that any further optimizations which we are about to make and plan to make in the future will even further improve the performance of SlangVM over the current tree-walker interpreter.

Expectations (Faster than Light)

Our expectation is that at the initial release of the VM without any optimization, simply compiling AST parse tree into bytecode, we are aiming to achieve at least 10% improvement against the baseline. Per process that would translate into 5% as, on average, we spend 50% of overall process runtime inside the interpreter loop. The remaining 50% would be spent in the native C++ code in native implemented functions and add-ins of various I/Os, be it a file system or network I/O. Further down the line, in a few years' time when we actually implement all our optimizations, we are expecting to be two times faster than our current tree-walker interpreter, and we expect that to translate to 25% improvement for our average processes. Once we get to the stage where we are able to JIT our bytecode into native assembly, we expect to have a more than 10% speedup against baseline Slang. This will translate into our estimated reduce in the compute hours that we spend by 135 million compute hours per week. We currently spend more than 300 million compute hours per week running Slang processes. Therefore, our estimation, once we hit the five-year mark, and all our optimization, and JITing to native assembly is achieved, we will reduce our footprint by 135 million compute hours per week.

See more presentations with transcripts

Recorded at:

Feb 09, 2024

Michal Dorko

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?