InfoQ Homepage Articles Transcrypt: Anatomy of a Python to JavaScript Compiler

Transcrypt: Anatomy of a Python to JavaScript Compiler

Mar 08, 2017 17 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Key Takeaways

Languages that run in the browser should precompile to JavaScript isomorphically for compactness, execution speed and development speed
For efficient cooperation on a large web application, module boundaries should coincide with team boundaries
Modules may have dynamic typing on the inside, but should have static typing on the outside
Having the same technology on the client and on the server promotes scalability
The future of Python in the browser is tied to the future of Python in general, not so much to a particular implementation

Featuring a diversity of programming languages, backend technology offers the right tool for any kind of job. At the frontend, however, it's one size fits all: JavaScript. Someone with only a hammer will have to treat anything like a nail. One attempt to break open this restricted world is represented by the growing set of source to source compilers that target JavaScript. Such compilers are available for languages as diverse as Scala, C++, Ruby, and Python. The Transcrypt Python to JavaScript compiler is a relatively new open source project, aiming at executing Python 3.6 at JavaScript speed, with comparable file sizes.

For a tool like this to offer an attractive alternative to everyday web development in JavaScript, at least the following three demands have to be met:

From a user point of view, web sites and web applications created with it should be indistinguishable with regard to look and feel, page load time, page startup time and sustained speed
From a developer point of view, it should allow seamless access to any JavaScript library, efficient debugging and the opportunity to capitalize on existing skills
From a business point of view, it should offer continuity, availability of a large pool of professionally trained developers, a good ratio of created functionality to invested hours and a resulting application open to changing needs

To be successful, all aspects of these three requirements have to be met. Different compilers strike a different balance between them, but no viable compiler for every day production use can neglect any of them. For Transcrypt, each of the above three points has led to certain design decisions.

Demand 1:

Look and feel of web sites and web applications are directly connected to the underlying JavaScript libraries used, so to have exactly the same look and feel, a site or application should use exactly the same libraries.

Although fast connections may hide the differences, achieving the same page load time, even on mobile devices running on public networks, mandates having roughly the same code size. This rules out downloading a compiler, virtual machine or large runtime at each new page load.

Achieving the same startup time as pages utilizing native JavaScript is only possible if the code is statically precompiled to JavaScript on the server. The larger the amount of code needed for a certain page, the more obvious the difference becomes.

To have the same sustained speed, the generated JavaScript must be efficient. Since JavaScript virtual machines are highly optimized for common coding patterns, the generated JavaScript should be similar to handwritten JavaScript, rather than emulating a stack machine or any other low level abstraction.

Demand 2:

To allow seamless access to any JavaScript library, Python and JavaScript have to use unified data formats, a unified calling model, and a unified object model. The latter requires the JavaScript prototype based single inheritance mechanism to somehow gel with Python’s class based multiple inheritance. Note that the recent addition of the keyword 'class' to JavaScript has no impact on the need to bridge this fundamental difference.

To enable efficient debugging, things like setting breakpoints and single stepping through code have to be done on the source level. In other words: source maps are necessary. Whenever a problem is encountered it must be possible to inspect and comprehend the generated JavaScript to pinpoint exactly what's going on. To this end, the generated JavaScript should be isomorphic to the Python source code.

The ability to capitalize on existing skills means that the source code has to be pure Python, not some syntactic variation. A robust way to achieve this is to use Python's native parser. The same holds for semantics, a requirement that poses practical problems and requires introduction of compiler directives to maintain runtime efficiency.

Demand 3:

Continuity is needed to protect investments in client side Python code, requiring continued availability of client side Python compilers with both good conformance and good performance. Striking the right balance between these two is the most critical part of designing a compiler.

Continued availability of trained Python developers is sufficiently warranted by the fact that Python has been the number 1 language taught in introductory computer science courses for three consecutive years now. On the backend it is used for every conceivable branch of computing. All these developers, used to designing large, long lived systems rather than insulated, short lived pieces of frontend script code, become available to browser programming if it is done in Python.

With regard to productivity, many developers that have made the switch from a different programming language to Python agree that it has significantly increased their output while retaining runtime performance. The latter is due to the fact that libraries used by Python applications for time critical operations like numerical processing and 3D graphics usually compile to native machine code.

The last point – openness to changed needs – means that modularity and flexibility have to be supported at every level. The presence, right from the start, of class-based OO with multiple inheritance and a sophisticated module and package mechanism has contributed to this. In addition, the possibility to use named and default parameters allows developers to change call signatures in a late stage without breaking existing code.

Conformance versus performance: language convergence to the rescue

Many Python constructs closely match JavaScript constructs, especially when translating to newer versions of JavaScript. There's a clear convergence between both languages. Specifically, more and more elements of Python make their way into JavaScript: for ... of ..., classes (in a limited form), modules, destructuring assignment and argument spreading. Since constructs like for ... of ... are highly optimized on modern JavaScript virtual machines, it's advantageous to translate such Python constructs to closely matching JavaScript constructs. Such isomorphic translation will result in code that can benefit from optimizations in the target language. It will also result in JavaScript code that is easy to read and debug.

Although with Transcrypt, through the presence of source maps, most debugging will take place stepping through Python rather than JavaScript code, a tool should not conceal but rather reveal the underlying technology, granting developer full access to 'what's actually going on'. This is even more desirable since native JavaScript code can be inserted at any point in the Python source, using a compiler directive.

The isomorphism between Python and the JavaScript code generated by Transcrypt is illustrated by the following fragment using multiple inheritance.

    class C (A, B):
        def __init__ (self, x, y):
            A.__init__ (self, x)
            B.__init__ (self, y)
            
        def show (self, label):
            A.show (self, label)
            B.show (self, label)

translates to:

    var C = __class__ ('C', [A, B], {
        get __init__ () {return __get__ (this, function (self, x, y) {
            A.__init__ (self, x);
            B.__init__ (self, y);
        });},
        get show () {return __get__ (this, function (self, label) {
            A.show (self, label);
            B.show (self, label);
        });}
    });

Striving for isomorphic translation has limitations, rooted in subtle, but sometimes hard to overcome differences between the two languages. Whereas Python allows lists to be concatenated with the + operator, isomorphic use of this operator in JavaScript result in both lists being converted to strings and then glued together. Of course a + b could be translated to __add__ (a, b), but since the type of a and b is determined at runtime, this would result in a function call and dynamic type inspection code being generated for something as simple as 1 + 1, resulting in bad performance for computations in inner loops. Another example is Python's interpretation of 'truthyness'. The boolean value of an empty list is True (or rather: true) in JavaScript and False in Python. Dealing with this globally in an application would require every if-statement to feature a conversion, since in the Python construct if a: it cannot be predicted whether a holds a boolean or somthing else like a list So if a: would have to be translated to if( __istrue__ (a)), again resulting in slow performance if used in inner loops.

In Transcrypt, compiler directives embedded in the code (pragmas) are used control compilation of such constructs locally. This enables writing matrix computations using standard mathematics notation like M4 = (M1 + M2) * M3, at the same time not generating any overhead for something like perimeter = 2 * pi * radius. Syntactically, pragma's are just calls to the __pragma__ function, executed compile time rather than run time. Importing a stub module containing def __pragma__ (directive, parameters): pass allows this code to run on CPython as well, without modification. Alternatively, pragmas can be placed in comments.

Unifying the type system while avoiding name clashes

Another fundamental design choice for Transcrypt was to unify the Python and the JavaScript type system, rather than have them live next to each other, converting between them on the fly. Data conversion costs time and increases target code size as well as memory use. It burdens the garbage collector and makes interaction between Python code and JavaScript libraries cumbersome.

So the decision was made to embrace the JavaScript world, rather than to create a parallel universe. A simple example of this is the following code using the Plotly.js library:

  __pragma__ ('jskeys')    # For convenience, allow JS style unquoted string literals as dictionary keys
    
    import random
    import math
    import itertools
    
    xValues = [2 * math.pi * step / 200 for step in range (201)]
    yValuesList = [
        [math.sin (xValue) + 0.5 * math.sin (xValue * 3 + 0.25 * math.sin (xValue * 5)) for xValue in xValues],
        [1 if xValue <= math.pi else -1 for xValue in xValues]
    ]
    kind = 'linear'
    Plotly.plot (
        kind,
        [
            {
                x: xValues,
                y: yValues
            }
            for yValues in yValuesList
        ],
        {
            title: kind,
            xaxis: {title: 'U (t) [V]'},
            yaxis: {title: 't [s]'}
        }
    )

Apart from the pragma allowing to leave out the quotes from dictionary keys, which is optional and only used for convenience, the code looks a lot like comparable JavaScript code. Note the (optional) use of list comprehensions, a facility JavaScript still lacks. The fact that Python dictionary literals are mapped to JavaScript object literals is of no concern to the developer; they can use the Plotly JavaScript documentation while writing Python code. No conversion is done behind the scenes. A Transcrypt dict IS a JavaScript object, in all cases.

In unifying the type systems, name clashes occur. Python and JavaScript strings both have a split (), but their semantics have important differences. There are many cases of such clashes and, since both Python and JavaScript are evolving, future clashes are to be expected.

To deal with these, Transcrypt supports the notion of aliases. Whenever in Python <string>.split is used, this is translated to <string>.py_split, a JavaScript function having Python split semantics. In native JavaScript code split will refer to the native JavaScript split function as it should. However, the JavaScript native split method can also be called from Python, where it is called js_split. While methods like these predefined aliases are available in Transcrypt, the developer can define new aliases and undefine existing ones. In this way any name clashes resulting from the unified type system can be resolved without run time penalty, since aliases do their work compile time.

Aliases also allow generation of any JavaScript identifier from a Python identifier. An example is the $ character, that is allowed as part of a name in JavaScript but forbidden in Python. Transcrypt strictly conforms Python syntax and is parsed by the native CPython parser, making its syntax identical. A piece of code using JQuery may look as follows:

__pragma__ ('alias', 'S', '$')
    
    def start ():
        def changeColors ():
            for div in S__divs:
                S (div) .css ({
                    'color': 'rgb({},{},{})'.format (* [int (256 * Math.random ()) for i in range (3)]),
                })
    
        S__divs = S ('div')
        changeColors ()
        window.setInterval (changeColors, 500)

Since Transcrypt uses compilation rather than interpretation, imports have to be decided upon compile time, to allow joined minification and shipment of all modules involved. To this end C-style conditional compilation is supported, as can be seen in the following code fragment:

__pragma__ ('ifdef', '__py3.6__') 
    import dashed_numbers_test          # Import only  for Python 3.6, that supports them
__pragma__ ('endif')

The same mechanism is used in the Transcrypt runtime to switch between JavaScript 5 and JavaScript 6 code:

    __pragma__ ('ifdef', '__esv6__')
                for (let aClass of classinfo) {
    __pragma__ ('else')
                for (var index = 0; index < classinfo.length; index++) {
                    var aClass = classinfo [index];
    __pragma__ ('endif')

In this way optimizations in newer JavaScript versions are taken into account, retaining backward compatibility. In some cases, the possibility for optimization is preferred over isomorphism:

    # Translate i += 1 to i++ and i -= 1 to i--
    if type (node.value) == ast.Num and node.value.n == 1:
        if type (node.op) == ast.Add:
            self.emit ('++')
            return
        elif type (node.op) == ast.Sub:
            self.emit ('--')
            return

Some optimizations are optional, such as the possibility to activate call caching, resulting in repeated calls to inherited methods being done directly, rather than through the prototype chain.

Static versus dynamic typing: Scripting languages growing mature

There has been a resurgence in appreciation of the benefits of static typing, with TypeScript being the best known example. In Python, as opposed to JavaScript, static typing syntax is an integral part of the language and supported by the native parser. Type checking itself, however, is left to third party tools, most notably mypy, a project from Jukka Lehtosalo with regular contributions of Python initiator Guido van Rossum. To enable efficient use of mypy in Transcrypt, the Transcrypt team contributed a lightweight API to the project, that makes it possible to activate mypy from another Python application without going through the operating system. Although mypy is still under development, it already catches an impressive amount of typing errors at compile time. Static type checking is optional and can be activated locally by inserting standard type annotations. A trivial example of the use of such annotations is the mypy in-process API itself:

def run(params: List[str]) -> Tuple[str, str, int]:
    sys.argv = [''] + params

    old_stdout = sys.stdout
    new_stdout = StringIO()
    sys.stdout = new_stdout

    old_stderr = sys.stderr
    new_stderr = StringIO()
    sys.stderr = new_stderr

    try:
        main(None)
        exit_status = 0
    except SystemExit as system_exit:
        exit_status = system_exit.code

    sys.stdout = old_stdout
    sys.stderr = old_stderr

    return new_stdout.getvalue(), new_stderr.getvalue(), exit_status

As illustrated by the example, static typing can be applied where appropriate, in this case in the signature of the run function, since that is the part of the API module that can be seen from the outside by other developers. If anyone misinterprets the parameter types or the return type of the API, mypy will generate a clear error message, referring to the file and line number where the mismatch occurs.

The concept of dynamic typing remains central to languages like Python and JavaScript, because it allows for flexible data structures and helps to reduce the amount of source code needed to perform a certain task. Source code size is important, because to understand and maintain source code, the first thing that has to happen is to read through it. In that sense, 100 kB of Python source code offers a direct advantage over 300 kB of C++ source that has the same functionality, but without the hard to read type definitions using templates, explicit type inspection and conversion code, overloaded constructors and other overloaded methods, abstract base classes to deal with polymorphic data structures and type dependent branching.

For small scripts well below 100kB source code and written by one person, dynamic typing seems to only have advantages. Very little planning and design are needed; everything just falls into place while programming. But when applications grow larger and are no longer built by individuals but by teams, the balance changes. For such applications, featuring more than roughly 200kB source code, the lack of compile time type checking has the following consequences:

Many errors are only caught at runtime, often late in the process, making remedies more expensive, since they influence more code already written.
Module interfaces tend to be open to several interpretations, due to the lack of type information they carry. This means that more development time is consumed by consultation between team members to establish the correct use of a module API.
Especially when working with a large team, dynamically typed interfaces can lead to unwanted coupling of design decisions taken in distinct modules. Thin, well specified interfaces become a necessity.

An interface featuring even one parameter that may refer to a complex, dynamically typed object structure, cannot be considered sufficiently stable to warrant separation of concerns. While this type of 'who did what, why and when' programming accounts for tremendous flexibility, it also accounts for design decisions being postponed to the very last, impacting large amounts of already written code, requiring extensive modifications.

The 'coupling and cohesion' paradigm applies. It's OK for modules to have strong coupling of design decisions on the inside. But between modules there should preferably be loose coupling, a design decision to change the inner workings of one module should not influence the others. In general, this leads to the following rules of the thumb for the choice between dynamic and static typing.

Inside a particular module design decisions are allowed to be coupled. Designing it as a cohesive entity will result in less source code to read through and ease experimentation with different implementations. Dynamic typing is an effective means to this end, imposing minimum design time overhead at maximum flexibility.
On the boundaries between modules, developers will have draw up stable 'contracts' on what information to exchange exactly. In this way they can work in parallel without constant deliberation and aim for a fixed, rather than a moving target. Static typing fits the bill here, allowing formal, machine validated agreement upon which information crosses the API.

So while the current surge in static typing may seem like a regression, it isn't. Dynamic typing has earned its place and it won't go away. The opposite is also true: even a traditionally statically typed language like C# has absorbed dynamic typing concepts. But with the complexity of applications written in languages like JavaScript and Python growing, effective modularization, cooperation and unit validation strategies gain importance. Scripting languages are coming of age.

Why choose Python over JavaScript on the client?

Due to the immense popularity of programming for the web, JavaScript has drawn lots of attention and investment. There are clear advantages in having the same language on the client and on the server. An important advantage is that it becomes possible to move code from server to client in a late stage, when an application is upscaled.

Another advantage is unity of concept, allowing developers to work both on the front end and the back and without constantly switching between technologies. The desirability of decreasing the conceptual distance between the client and server part of an application has resulted in the popularity of a platform like Node.js. But at the same time, it carries the risk of expanding the 'one size fits all' reality of current web client programming to the server. JavaScript is considered a good enough language by many. Recent versions finally start to support features like class based OO (be it in the form of a thin varnish over its prototyping guts), modules and namespaces. With the advent of TypeScript, the use of strict typing is possible, though incorporating it in the language standard is probably some years away.

But even with these features, JavaScript isn't going to be the one language to end all languages. A camel may resemble a horse designed by a committee, but it never becomes one. What the browser language market needs, in fact what any free market needs, is diversity. It means that the right tool can be picked for the job at hand. Hammers for nails, and screwdrivers for screws. Python was designed with clean, concise readability in mind right from the start. The value of that shouldn't be underestimated.

JavaScript will probably be the choice of the masses in programming the client for a long time to come. But for those who consider the alternative, what matters to continuity is the momentum behind a language, as opposed to an implementation of that language. So the most important choice is not which implementation to use, but which language to choose. In that light Python is an effective and safe choice. Python has a huge mindshare and there's a growing number of browser implementations for it, approaching the golden standard of CPython closer and closer while retaining performance.

While new implementations may supersede existing ones, this process is guided by a centrally guarded consensus over what the Python language should entail. Switching to another implementation will always be easier than switching to the next JavaScript library hype or preprocessor with proprietary syntax to deal with its shortcomings. Looking at the situation in the well-established server world, it is to be expected that multiple client side Python implementations will continue to exist side by side in healthy competition. The winner here is the language itself: Python in the browser is there to stay.

About the Author

Jacques de Hooge MSc is a C++/Python developer living in Rotterdam, the Netherlands. After graduating from the Delft University of Technology, department of Information Theory, he started his own company, GEATEC engineering, specializing in Realtime Controls, Scientific Computation, Oil and Gas Prospecting and Medical Imaging. He is a part-time teacher at the Rotterdam University of Applied Sciences, where he teaches C++, Python, Image Processing, Artificial Intelligence, Robotics, Realtime Embedded Systems and Linear Algebra. Currently he's developing cardiological research software for the Erasmus University in Rotterdam. Also he is the initiator and the lead designer of the Transcrypt open source project.

InfoQ Software Architects' Newsletter

Transcrypt: Anatomy of a Python to JavaScript Compiler

Write for InfoQ

Key Takeaways

Related Sponsors

About the Author

Rate this Article

This content is in the Web Development topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter