BT

Facilitating the spread of knowledge and innovation in professional software development

Contribute

Topics

Choose your language

InfoQ Homepage News How x86 to arm64 Translation Works in Rosetta 2

How x86 to arm64 Translation Works in Rosetta 2

This item in japanese

Bookmarks

Along with its plan to transition their Macintosh line from Intel CPUs to its own CPUs, dubbed Apple Silicon, Apple announced Rosetta 2, a binary translation software that aims to smooth out the process. Thanks to Rosetta 2, most x86 programs will be able to execute after an initial translation step.

Apple started to use binary translation technology for the first time in 2006, when they began switching from PowerPC CPUs to x86. Based on QuickTransit, a technology originally developed by Transitive Corporation, later acquired by IBM, Rosetta is mostly transparent to users. The only side-effect users may perceive is their apps launching or running more slowly at times.

A slow launch is mostly the case the first time an app is launched. Indeed, this is when Rosetta usually kicks in; when the OS detects that the binary only includes x86_64 instructions. According to Joe Rossignol, writing for Mac Rumors, Microsoft indicated that the first launch of any of its Mac apps took approximately 20 seconds, while subsequent launches were fast. To reduce the impact of this initial translation step, Rosetta 2 is also able to translate an app AOT when it is installed. It is not clear yet for which apps Rosetta AOT translation will be supported.

While it is true that macOS will prefer running an app's arm64 instructions when they are available, the user can override this behaviour on an app-by-app basis. This could be required, for example, to run an app that has already been ported to Apple Silicon except for some legacy plugins or other kinds of binary extensions that the user depends on. In fact, one limitation of Rosetta is it will not enable mixing X86_64 and arm64 instructions in the same process.

Based on past experience with Rosetta, Rosetta 2 will be of great help to smooth the transition to Apple Silicon for both end users and developers. This does not rule out, though, that in a number of cases users will need to wait until the software they need will be released with native support for arm64. In particular, Rosetta 2 will not be able to translate kernel extensions nor will it support virtualization of x86_64 platforms. The latter implies that Rosetta 2 will not translate virtual machines such as VMWare and VirtualBox, nor Docker.

One major concern with Rosetta is performance. With the transition from PPC to x86, one factor slowing down Rosetta was the different byte ordering used by the two platforms, with PowerPC being a big-endian architecture, and x86 little-endian. While byte ordering is not a problem for the transition from x86 to ARM, another issue related to memory, namely the memory consistency model total store ordering (TSO), could hamper performance in this case. To prevent this from happening, Apple added support for x86 memory ordering to the M1 CPU, as Robert Graham noted on Twitter.

In addition to this, as Graham describes, Apple has been implementing a number of other "tricks" to improve their chip performance, including speeding up JavaScript execution, retaining and releasing memory faster, and so on.

One specific bit of Rosetta 2 that sparked the interest of several developers on Hacker News is its ability to also translate apps that contain just-in-time compilers. The exact mechanics about this are not publicly documented, but Apple might be using a page fault to detect when the code attempts to jump into a recently created code page -- that is, roughly, a page that was set to writable mode, then switched to read-only and executable. When the page fault is handled, Rosetta will translate that page content.

As a final note, apps have a chance to detect when they are run under Rosetta.

We need your feedback

How might we improve InfoQ for you

Thank you for being an InfoQ reader.

Each year, we seek feedback from our readers to help us improve InfoQ. Would you mind spending 2 minutes to share your feedback in our short survey? Your feedback will directly help us continually evolve how we support you.

Take the Survey

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • One mistake about the meaning of memory ordering

    by Ryan Wong,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The issue that affected PPC to x86 migration was called "byte ordering", namely the difference between big-endian architecture and little-endian architecture. The issue affects memory reads and writes comprising multiple bytes. Software can be written so that it automatically detects the underlying byte-ordering of the machine and perform the additional work, but most software were not written with that awareness.



    The issue that is affecting x86 to ARM migration is called memory consistency model. Among the issues in memory consistency model, one of them is called "total store ordering" (TSO), and this is the "memory ordering" that Rob Graham's tweets alluded to. In other words, the "memory ordering" issue mentioned in late 2020 has nothing to do with the "memory ordering" issue that surfaced in 2006. Apple Silicon M1 contains a hardware flip switch that allows cores to adhere to a more strict memory consistency model found on x86, thus eliminating certain types of software malfunctions that may happen when certain x86 binaries (which are written with the assumption of x86 memory consistency behaviors) are machine-translated into ARM instructions.

    These issues cannot be addressed with static binary analysis alone, because multi-threaded programs that perform direct inter-thread communication via shared memory may do so using ordinary memory-access instructions. On x86, such "unmarked" memory access instructions (without the "lock" instruction prefix) will still satisfy TSO, whereas on ARM this is not the case. Static binary analysis cannot conclusively detect whether a piece of x86 instruction performs a memory read/write that is intended to be observed by another thread. This can only be detected at runtime as it happens, when the CPU detects that two different cores try to touch the same memory page at around the same time.

  • Re: One mistake about the meaning of memory ordering

    by Sergio De Simone,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Ryan, thanks for your interest and your thorough explanation!
    I have corrected my description based on that and added a couple of links.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.