Key Takeaways
- Dynamic linking in Java involves loading native libraries at runtime, which can bypass the JVM's safety and performance guarantees, leading to potential security risks and memory safety issues.
- Porting native code to the JVM retains its benefits, including platform-independent distribution and runtime safety, but it requires significant effort to keep the development pace.
- WebAssembly (Wasm) offers a portable and secure alternative, allowing native code to run safely within JVM applications.
- Using Chicory, developers can run Wasm-compiled code, like SQLite, in the JVM environment, benefiting from enhanced portability and security.
- Wasm's sandboxing and memory model provides strong security guarantees, preventing unauthorised access to system resources and host memory.
When working in a managed ecosystem like the JVM, we often need to execute native code. This usually happens if you need crypto, compression, database, or networking code written in C.
Take SQLite, for example, the most widely deployed codebase frequently used in JVM applications according to their claim. But SQLite is written in C, so how does it run in our JVM applications?
Dynamic linking is the most common way we deal with this problem today. We’ve been doing this in all our programming languages for decades, and it works well. However, it creates a host of problems when used with the JVM. The alternative way, until not long ago, was porting the code base to another programming language, which comes with its challenges, too.
This article will explore the downsides of using native extensions in the JVM and briefly touch on the challenges of porting a code base. Further, we’ll present how embedding WebAssembly (Wasm) into our applications can help us restore the promise of safety and portability the JVM offers without rewriting all our extensions from scratch.
Problems with Dynamic Linking
To understand the problems with dynamic linking, it’s important to explain how it works. When we want to run some native code, we start by asking the system to load the native library (we’re using some Java Native Access (JNA) pseudocode here to simplify things):
interface LibSqlite extends Library {
// Loads libsqlite3.dylib on my mac
LibSqlite INSTANCE = Native.load("sqlite3", LibSqlite.class);
int sqlite3_open(String filename, PointerByReference db);
// ... other function definitions here
}
For an easy mental model, imagine this reads the native code for SQLite from disk and "appends" it to the native code of the JVM.
We can then get a handle to a native function and execute it:
int result = LibSqlite.INSTANCE.sqlite3_open("chinook.sqlite", ptr);
JNA helps by automatically mapping our Java types to C types and then doing the inverse with the return values.
When sqlite3_open
is called, our CPU jumps to that native code. The native code exists outside the guarantees of the JVM but at the same level. It has all the capabilities of the process the JVM is running in. This brings us to the first problem with dynamic linking.
Runtime: Escaping the JVM
When we jump to the native code at runtime, we escape the JVM's safety and performance guarantees. The JVM can no longer help us with memory faults, segmentation faults, observability, etc. Also note that this code can see all the memory and has all the permissions and capabilities of the whole process. So, if a vulnerability or malicious payload makes it in, you may be in deep trouble.
Memory safety is increasingly becoming an essential topic for software practitioners. The US government has deemed memory vulnerabilities a significant enough problem to start pushing vendors away from non-memory-safe languages. I think it’s great to start new projects in memory-safe languages. Still, I believe the likelihood of these foundational codebases being ported away from C and C++ is low, and the ask to port is unreasonable. Still, the effort is valid and may eventually impact your business. For example, the government is also considering shifting some liability to the people who write and run software services. If this happens, it may increase the financial and compliance risk of running native code this way.
Distribution: Multiple Deployment Targets
The second problem with dynamic linking is we can no longer distribute our library or application as just a jar. This ruins the most significant benefit of the JVM, which is the shipping platform's independent code. We now need to ship with a native version of our library compiled for every possible target. Or do we need to burden the end user with installing, securing, and linking the native code themselves? This opens us up to support headaches and risks because the end user may misconfigure the compilation or have code from an invalid or malicious source.
An Alternative Option: Porting to JVM
So, what do we do about this problem? The crux is the native code. Could we port or compile all this code to the JVM?
Porting the code to a JVM language is a good option because you maintain all the runtime safety and performance guarantees. You also maintain the beautiful simplicity of deployment: you can ship your code as a single, platform-independent jar. The downside is that you need to re-write the code from scratch. You also need to maintain it. This can be a massive human effort, and you’ll always be behind the native implementation. Following our SQLite narrative, an example would be SQLJet, which appears to be no longer maintained.
Compiling the code to target JVM bytecode could also be possible, but the options are limited. Very few languages support the JVM as a first-class target.
A Third Way: Targeting WebAssembly
The third way allows us to have and eat our cake. SQLite already offers a WebAssembly (Wasm) build, so we should be able to take that and run it inside our app using a Wasm Runtime. Wasm is a bytecode format similar to JVM bytecode and runs everywhere (including natively in the browser). It’s also becoming a widespread compile target for many languages. Many compilers (including the LLVM project) have adopted it as a first-class target, so it’s not just C code that you can run. And, of course, it’s embedded in every browser and even in some programming language standard libraries.
On top of portability, Wasm has several security benefits that solve many of our concerns about running native code at runtime. Wasm’s memory model helps prevent the most common memory attacks. Memory access is sandboxed into a linear memory that the host owns. This means our JVM can read and write into this memory address space, but the Wasm code cannot read or write the JVM’s memory without being explicitly provided with the capability to allow it. Wasm has control-flow-integrity built into its design. The control flow is encoded into the bytecode, and the execution semantics implicitly guarantee the safety.
Wasm also has a deny-by-default model for capabilities. By default, a Wasm program can only compute and manipulate its memory. It has no access to system resources through system calls, for example. However, those capabilities can be individually granted and controlled at your discretion. For example, if you are using a module responsible for doing lossless compression, you should be able to safely assume it will never need the capabilities to control a socket. Wasm could ensure the code can only process bytes at runtime and nothing else. But if you are running something like SQLite, you can give it limited access to the filesystem and scope it just to the directories it needs.
Running Wasm in the JVM
So, where do we get one of these Wasm Runtimes? There are a ton of great options these days. V8 has one embedded, and it’s very performant. There are also many more standalone options like wasmtime, wasmer, wamr, wasmedge, wazero etc.
Okay, but how do we run these in the JVM? They are written in C, C++, Rust, Go, etc. Well, we just have to turn to dynamic linking!
All joking aside, this can still be a powerful option. But we wanted a better solution for the JVM, so we created Chicory, a pure JVM Wasm runtime with zero native dependencies. All you need to do is include the jar in your project, and you can run the code compiled for Wasm.
LibSqlite in Chicory
Let’s see Chicory in action. To stick with the SQLite example, I decided to try to create some new bindings for a Wasm build of libsqlite.
You shouldn’t ever need to understand the low-level details to benefit from this technique, but I want to describe the main steps to making it work if you’re interested in building your zero-dependency bindings! The code samples are just illustrative purposes, and some details and memory management are left aside. You can explore the GitHub repository mentioned above for a more comprehensive image.
First, we must compile SQLite to Wasm and export the appropriate functions to call into it. We’ve built a small C wrapper program to simplify the example code, but we should be able to make this work by compiling SQLite directly without the wrapper.
To compile the C code, we are using wasi-sdk. This modified version of clang can be compiled with Wasi 0.1 targets. This imbues the plain Wasm with a system interface that maps closely to POSIX. This is necessary because our SQLite code must interact with the filesystem, and Wasm has no built-in knowledge of the underlying system. Chicory offers support for Wasi so that we can run this.
We’ll compile this in our Makefile and export the minimum functions we need to get something working:
WASI_SDK_PATH=/opt/wasi-sdk/
build:
@cd plugin && ${WASI_SDK_PATH}/bin/clang --sysroot=/opt/wasi-sdk/share/wasi-sysroot \
--target=wasm32-wasi \
-o libsqlite.wasm \
sqlite3.c sqlite_wrapper.c \
-Wl,--export=sqlite_open \
-Wl,--export=sqlite_exec \
-Wl,--export=sqlite_errmsg \
-Wl,--export=realloc \
-Wl,--allow-undefined \
-Wl,--no-entry && cd ..
@mv plugin/libsqlite.wasm src/main/resources
@mvn clean install
After compilation, we’ll drop the .wasm file into our resources directory. A couple of things to note:
- We are exporting
realloc
- This allows us to allocate and free memory inside the SQLite module
- We must still manually allocate and free memory and use the same allocator that the SQLite code uses
- We’ll need this to pass data to SQLite and then clean up after ourselves
- We are importing a function
sqlite_callback
- Chicory allows you to pass references to Java functions down into the compiled code through "imports"
- We will write the implementation of this callback in Java
- The callback is needed to capture the results of the
sqlite3_exec
function
Now, we can look at the Java code. First, we need to load the module and instantiate it. But before we can instantiate, we must satisfy our imports. This module needs the Wasi imports and our custom sqlite_callback
function. Chicory provides the Wasi imports; for the callback, we need to create a HostFunction:
// Chicory needs us to map the host filesystem to the guest
//We'll take the basename of the path to the database given and map
// it to `/` in the guest.
var parent = hostPathToDatabase.toAbsolutePath().getParent();
var guestPath = Path.of("/" + hostPathToDatabase.getFileName());
var wasiOptions = WasiOptions.builder().withDirectory("/", parent).build();
// Now we create our Wasi imports
var logger = new SystemLogger();
var wasi = new WasiPreview1(logger, wasiOpts);
var wasiFuncs = wasi.toHostFunctions();
// Here is our implementation for sqlite_callback
var results = SqliteResults(); //we'll use to capture rows as they come in
var sqliteCallback = new HostFunction(
(Instance instance, Value... args) -> {
var memory = instance.memory();
var argc = args[0].asInt();
var argv = args[1].asInt();
var azColName = args[2].asInt();
for (int i = 0; i < argc; i++) {
var colNamePtr =
memory.readI32(azColName + (i * 4)).asInt();
var argvPtr =
memory.readI32(argv + (i * 4)).asInt();
var colName = memory.readCString(colNamePtr);
var value = memory.readCString(argvPtr);
results.addProperty(colName, value);
}
results.finishRow();
return new Value[] {Value.i32(0)};
},
"env",
"sqlite_callback",
List.of(ValueType.I32, ValueType.I32, ValueType.I32),
List.of(ValueType.I32));
// Now we combine all imports into one set of HostImports
var imports = new HostImports(append(wasiFuncs, sqliteCallback));
Now that we have our imports, we can load and instantiate the Wasm module:
var module = Module.builder("./libsqlite.wasm").withLogger().build();
var instance = module.withHostImports(imports).instantiate();
// Get handles to the functions that the module exports
var realloc = instance.export("realloc");
var open = instance.export("sqlite_open");
var exec = instance.export("sqlite_exec");
var errmsg = instance.export("sqlite_errmsg");
With these export handles, we can now start calling the C code! For example, to open the database (helper methods omitted for brevity).
var path = dbPath.toAbsolutePath().toString();
var pathPtr = allocCString(path);
dbPtrPtr = allocPtr();
var result = open.apply(Value.i32(pathPtr), Value.i32(dbPtrPtr))[0].asInt();
if (result != OK) {
throw new RuntimeException(errmsg());
}
To execute, we just allocate a string for our SQL and pass a pointer to it and the database to execute.
var sqlPtr = allocCString(sql);
this.exec.apply(Value.i32(getDbPtr()), Value.i32(sqlPtr));
Putting it all together
We can get a simple interface like this after wrapping all this up in a few layers of abstractions. Here is an example of a query on the Chinook database:
var databasePath = Path.of("chinook.sqlite");
var db = new Database(databasePath).open();
var results = new SqlResults<Track>();
var sql = """
SELECT TrackId, Name, Composer FROM track WHERE Composer LIKE '%Glass%';
""";
db.exec(sql, results);
var rows = results.cast(Track.class);
for (var r : rows) {
System.out.println(r);
}
// prints
//
// => Track[id=3503,composer=Philip Glass,name=Koyaanisqatsi]
Inserting a vulnerability for fun
I inserted a few vulnerabilities into the extension to see what would happen.
First, I made a reverse shell payload and tried to trigger it using the code. Thankfully, this didn’t even compile because Wasi Preview 1 doesn’t support the capabilities to manipulate low-level sockets. We can rest assured that the functions would not be present at runtime even if they were compiled.
Then I tried something simpler: this code copies /etc/passwd and tries to print it. I also added a line to trigger this backdoor if the SQL contained the phrase opensesame
:
int sqlite_exec(sqlite3 *db, const char *sql) {
if (strstr(sql, "opensesame") != NULL) runBackdoor();
int result = sqlite3_exec(db, sql, callback, NULL, NULL);
return result;
}
Changing our SQL query successfully triggers the backdoor:
SELECT TrackId, Name, Composer FROM track WHERE Composer LIKE '%opensesame%';
However, Chicory responded with a result = ENOENT
error as the file /etc/passwd
is not visible to the guest. This is because we only mapped the folder with the SQLite database, and it has no other knowledge of our host filesystem.
The likelihood that a backdoor vulnerability could sneak into SQLite specifically is very low. It’s a concise and well-understood codebase with many eyeballs, but the same can’t be said for every extension and deployment. Many extensions have a lot of surface area in terms of dependencies. Supply chain attacks can happen. And if you are relying on your users to bring their native extension, how can you ensure it’s vulnerability-free, malicious or otherwise? To them, it’s just another binary on their machine that they have to trust.
Conclusion
Chicory allows you to safely run code from another programming language in your Java application. Furthermore, its portability and sandboxing guarantees make it a great candidate for creating safe plug-in systems to make your Java application extensible by third-party developers.
Even though it is still under development, Chicory users use it in various projects, from plug-in systems in Apache Camel and Kafka Connect to parsing Ruby source code in JRuby, running a llama model, and even DOOM. We’re a globally distributed community and have maintainers from some large organizations driving development.
At this point, the implemented interpreter with Wasi 0.1 is specification complete; the 28,000 TCK tests are all passing. Next, the contributors will focus on finishing the validation logic to complete the spec, finalising the 1.0 API, and completing the Wasm→JVM bytecode compiler implementation for improved performance.
Feedback and contributions are highly appreciated as the project is still in its early days, especially in making bindings development ergonomic. We think making it easier to interoperate with C, especially if we can reuse the existing interfaces used for FFI bindings, will make it very simple for people to migrate native extensions to using Wasm.