Node.js VM - Isolating our Apps from the Core Engine

Introduction

As a JavaScript developer, one might know that Node.js is built on top of Chrome’s V8 Engine, giving us a way to run a JS program extremely fast by compiling it to machine code directly. This in a way gives V8 the role of an Operating System, running on top of your OS, similar to how you might imagine a Virtual Machine running.

If you look around in the Node.js documentation, you might encounter an interesting module called “VM (Executing JavaScript)”. It gives you access to APIs for compiling and running code within V8 Virtual Machine contexts. When we read about this at Rocket.Chat, this looked very interesting. This way, we had the ability to create our own context, running within the same localised environment we had created for our server. Basically, it gave us direct access to the V8 engine, separating our code from the application, getting it run inside a sandboxed environment.

At Rocket.Chat, our goal has always been towards making a fully customizable communications platform for organizations with high standards of data protection. One of the unique advantages we have is the ability to self-host it in an air-gapped environment. This allows our users to really own their data by keeping it safe inside their own infrastructure.

However, as our user base grew, we realized their need for the diverse pool of integrations in the workspace was going to increase maintenance complexity - from developing, reviewing, testing and making sure it kept working. So to address this (and some other similar things), we set out to take a step further on our platform's flexibility, with the introduction of the Apps Engine - our implementation of the Node.js VM module.

Exploring the Node.js VM module

Since VM is a native module inside Node.js, there isn't much setup required to test it up. The best part, it provides a very straightforward API, as you can see in Node.js VM’s own official documentation as well. One basically needs to createContext and then execute the code by runInContext. Let us see the example they’ve provided here:

const vm = require('vm');
 
const x = 1;
 
const newEnv = { x: 2 };
vm.createContext(newEnv); // Contextify the object.
 
const code = 'x += 40; var y = 17;';
// `x` and `y` are global variables in the context.
// Initially, x has the value 2 because that is the value of context.x.
vm.runInContext(code, newEnv);
 
console.log(newEnv.x); // 42
console.log(newEnv.y); // 17
 
console.log(x); // 1; y is not defined.

Here, you can easily see how, inside of a simple Node.js environment, we can use the createContext() method to isolate the object newEnv into a different Context, directly within the V8 Engine. Now, for x in a global scope, we have it set to 1, while inside our newEnv, it is set to 2.
When we run code in newEnv, we’re essentially separating it from the global scope, hence Node.js environment doesn’t have the access to it, and vice-versa. So, when we get the output, we can see that newEnv.x yields 42 & newEnv.y yields 17 and individually, x is 1 and y is not even defined!

So far so good, but wait, what actually is this `Context`?

A context here is basically an alternate environment that we’re creating with the use of this vm module. This environment is nothing but a bunch of key-value pairs, much like your shells, each directly running on top of the V8 engine.It might then be clear from this example, how the Node.js VM isolates the execution of the code it receives, from the global scope.

How we use the VM Module - Implementing our Apps Engine

The Apps-Engine is the framework that allows for the development of RC Apps ie., custom plugins that extend the functionality of the chat server. These apps allow for a tighter integration with whatever workflow our users, covering most, if not all, things that incoming/outgoing integrations and bots do. Interestingly, it has got some tricks of its own, like adding interactive buttons to messages and opening modals (check out Poll for a showcase!) and much more!

While developing this, we wanted all our workspaces to have access to Apps without needing to leave the comfort of their intranet. This prompted us to follow the rather uncommon path of having those apps running inside a Rocket.Chat server.

With that premise as a starting point, we knew we'd have to concern ourselves with keeping each of those apps running in an isolated scope inside the server, i.e. having no direct access to the server's functions or to the resources from other apps running alongside them. Hence, we could find the two important types of components we need in place for this to work:

some bridges to safely connect apps to the core server's features, &
a way to sandbox each app so they have their own exclusive context.

Given those points, the Node.js VM module for us sounded like the perfect approach to run any method within the App itself, while isolating it from the server’s scope. Although it sounds obvious, we had a lot of moving factors involved in it. So this implementation involved us figuring out various lifecycle methods for the loading, running, extending configurations etc.

Loading Apps

Rocket.Chat Apps are, at the end of the day, javascript classes. While loading an app, the first thing we need to do is to instantiate its class in the correct context. We do it like so:

public toSandBox(manager: AppManager, storage: IAppStorageItem, { files }: IParseAppPackageResult): ProxiedApp {
    if (typeof files[path.normalize(storage.info.classFile)] === 'undefined') {
        throw new Error(`Invalid App package for "${ storage.info.name }". ` +
            `Could not find the classFile (${ storage.info.classFile }) file.`);
    }
 
    const exports = {};
    const customRequire = Utilities.buildCustomRequire(files, storage.info.id);
    const context = Utilities.buildDefaultAppContext({ require: customRequire, exports, process: {}, console });
 
    const script = new vm.Script(files[path.normalize(storage.info.classFile)]);
    const result = script.runInContext(context);
 
    if (typeof result !== 'function') {
        // tslint:disable-next-line:max-line-length
        throw new Error(`The App's main class for ${ storage.info.name } is not valid ("${ storage.info.classFile }").`);
    }
 
    const appAccessors = new AppAccessors(manager, storage.info.id);
    const logger = new AppConsole(AppMethod._CONSTRUCTOR);
    const rl = vm.runInNewContext('new App(info, rcLogger, appAccessors);', Utilities.buildDefaultAppContext({
        rcLogger: logger,
        info: storage.info,
        App: result,
        process: {},
        appAccessors,
    }), { timeout: 1000, filename: `App_${ storage.info.nameSlug }.js` });
 
    /** code omitted for the sake of brevity */
}

Full code at https://github.com/RocketChat/Rocket.Chat.Apps-engine/blob/7af3781e5170a898030328b0502e75d9c990af8d/src/server/compiler/AppCompiler.ts#L26

The most interesting lines to look at here are:

```
const script = new vm.Script(files[path.normalize(storage.info.classFile)]);
```
Here, we instantiate a new vm.Script with the source code of the main class file of the App. Doing that will simply compile the javascript in that file, but will not execute anything.
```
const result = script.runInContext(context);
```
Now, we then execute the code of the script mentioned above. Keep in mind that the source code of the main class file doesn't do anything per se, it only exports the main App class and includes any other supporting script it needs via require calls. The result variable now contains a reference to the App class (NOT an instance yet, mind you!)
```
const rl = vm.runInNewContext('new App(info, rcLogger, appAccessors);', Utilities.buildDefaultAppContext({rcLogger: logger, info: storage.info, App: result, process: {}, appAccessors }), { timeout: 1000, filename: `App_${ storage.info.nameSlug }.js` });
```
We finally use the vm module to create an instance of the App class and return it. The first parameter new App(info, rcLogger, appAccessors); is the code that the vm will run, which is simply a new expression. Notice how we use the name App as the identifier for the class, and in the context we assign the result mentioned above to global identifier App, App: result - this is what makes the vm aware of that name and not throw a ReferenceError: App is not defined.

After the instantiation, we then load the resources for the app using one of the lifecycle methods from the class: App.initialize.

private async initializeApp(storageItem: IAppStorageItem, app: ProxiedApp, saveToDb = true, silenceStatus = false): Promise {
    // code omitted...
 
    try {
        await app.validateLicense();
 
        await app.call(AppMethod.INITIALIZE, configExtend, envRead);
        await app.setStatus(AppStatus.INITIALIZED, silenceStatus);
 
        result = true;
    } catch (e) {
        // ... error handling
    }
 
    // ... code omitted
}

Full code at https://github.com/RocketChat/Rocket.Chat.Apps-engine/blob/7af3781e5170a898030328b0502e75d9c990af8d/src/server/AppManager.ts#L810

The app developer can override this method in order to provide slash commands, settings, and other features that the app might use - so we need to properly run it in a separate context. This is done via the app.call method, which encapsulates the context preparation so we can easily reuse this whenever we need to communicate with the app directly. Now, once the app has instantiated and initialized, it is time to actually run it!

public async call(method: AppMethod, ...args: Array): Promise {
    // code omitted...
 
    try {
        // tslint:disable-next-line:max-line-length
        result = await this.runInContext(`app.${method}.apply(app, args)`, this.makeContext({ app: this.app, args })) as Promise;
        logger.debug(`'${method}' was successfully called! The result is:`, result);
    } catch (e) {
        logger.error(e);
        logger.debug(`'${method}' was unsuccessful.`);
 
        if (e instanceof AppsEngineException) {
            throw e;
        }
    }
 
    // code omitted...
 
}

Full code at https://github.com/RocketChat/Rocket.Chat.Apps-engine/blob/7af3781e5170a898030328b0502e75d9c990af8d/src/server/ProxiedApp.ts#L75

In the block above we prepare which method is going to be called and construct the calling code as the string app.${method}.apply(app, args). We then need to tell the new context of the VM what the names app and args are; and finally hand over control to this.runInContext, which implementation we can see below:

public runInContext(codeToRun: string, context: vm.Context): any {
    return vm.runInContext(codeToRun, context, {
        timeout: 1000,
        filename: `${ ROCKETCHAT_APP_EXECUTION_PREFIX }_${ this.getName() }.ts`,
    });
}

Full code at https://github.com/RocketChat/Rocket.Chat.Apps-engine/blob/7af3781e5170a898030328b0502e75d9c990af8d/src/server/ProxiedApp.ts#L68

This is just a common place for us to set the timeout and filename properties every time we call the VM. It's interesting to note that this new context for the method call here is different from the context we instantiated the App class in - which means they don't share any globals by default ie., the method call can't see any variable declared in the outer scope that is not a property of the App class, making the class a unique point of control for safety.

Other uses in the framework's internals

There are other points in which we need to interface with an app's internal logic - that's what makes them useful after all. In those points, we call the app.runInContext method and build a context tailor made for that specific scope. Here's a list of examples:

Our learnings around the Node.js VM Module

As mentioned before, our main motivation to go for the Node.js VM module was isolating each app on its own scope, effectively preventing them from interacting directly with the host Rocket.Chat server or other apps, so they wouldn't be able to inject data and disrupt the system, or even leak data from other sources. Additionally, prevent us from the complexity of spawning, controlling and managing communication of external processes to run those logics.

This was the foundation of the project and allowed us to focus our work on enabling apps to customize the experience inside our platform using a controlled API. And during the last 3 years of operating and maintaining it, we can say we've learned a thing or two about the vm module.

For the most part, we can say that the Node.js VM module delivered on what it promises. Every execution of the app's internal code has its own context, and this context is controlled by the Apps-Engine itself. Of course, we needed to adapt and iterate a lot, as our first assumptions of how this interaction with the module would work were not exactly right.

However, our implementation has some shortcomings. One of them is the fact that, by isolating the apps that much, we take away some key modules that are very familiar to Node.js devs, which makes our environment feel "alien" - it's still typescript, but the tools are now unknown.

Another difficulty that arose is troubleshooting errors that happened while executing the app's code. The stack trace of error messages, for example, points to several evalmachine. references, because the code executed is not really attached to any file known to Node and is instead compiled and run directly in the VM. Those aren't that helpful when developing an app, and can't indicate which app has an error when maintaining a server with many of them.

Finally, the hardest lesson was that the vm does not prevent ALL kinds of interactions with the outside world, and it is still possible to escape the sandbox if you really want to. It is now stated in the Node.js docs itself (Glad they added it!). We do circumvent this in our Marketplace with the approval process, as we evaluate the source code of apps before allowing them to be published, but as we increase the support for NPM modules this might become unfeasible.

Hence, there are some tough choices we have to make moving forward.

The Future of the Node.js VM in the Apps-Engine

From our experience in experimenting around with different apps on Rocket.Chat, it's clear that we need to improve our approach to isolating apps in the future. This doesn't mean just improving the security aspect of the product, but also increasing the flexibility and capabilities of the Apps Engine as a whole. However, we're not sure we'll be able to move forward with our current use of the Node.js VM. There are some alternatives we've been discussing around:

The vm2 NPM package, as its premise, is allowing a secure sandbox for running untrusted code with an API similar to the native Node.js module. Although we need to first investigate this further, but as it sounds, this would allow for a simpler transition in the framework's source code; basically changing `vm` to `vm2` 🤔
Running the Apps-Engine as a totally separate process, by using Node.js's own native modules to fork an isolated process. This would require a much bigger effort to achieve, as we would essentially need to create a process manager to control resources and the communication between the Rocket.Chat process and the Apps' processes. But this would mean achieving true isolation.
Another option we're evaluating is to make the framework act more like a serverless function platform. This is kind of a specialization of the alternative above, which would also include an overhaul of the Apps-Engine API for it to be more similar to those of Function-as-a-Service offerings. Arguably the API would be more familiar and idiomatic to today's code patterns.

Being open source, the community often recommends us many solutions here and there, adding to the advantage of having multiple eyes on the project! This discussion still needs to mature a lot, with us potentially running a few proofs of concept to reach a final decision (contributors are welcome!). Introducing this new "layer" between the framework and the host server will likely add an overhead to this operation, both in runtime and maintenance, and we'll need some experimenting to make a well informed choice

Frequently asked questions about <anything>

Douglas is a Tech Lead at Rocket.Chat. When he is not coding and worrying about developer experience and code semantics, he is studying random (human) languages or playing video-games with his kid (or by himself)

Douglas Gubert

Team collaboration: 5 reasons to improve it and 6 ways to master it