BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Turning a Node.js Monolith into a Monorepo without Disrupting the Team

Turning a Node.js Monolith into a Monorepo without Disrupting the Team

Bookmarks

Key Takeaways

  • To avoid git conflicts or a long code freeze period, develop a migration script.
  • Add a CI job to check that the build and tests still work after the migration.
  • Use Node’s conditional exports, so internal dependencies are resolved according to the environment: TS files during development, JS files at runtime.
  • Extract common TypeScript, ESLint and Prettier configuration as packages, then extend them.
  • Setup Turborepo in order to orchestrate dev workflows and optimize build time.

Splitting monoliths into services creates complexity in maintaining multiple repositories (one per service) with separate (yet interdependent) build processes and versioning history. Monorepos have become a popular solution to reduce that complexity.

Despite what monorepo tool makers sometimes suggest, setting up a monorepo in an existing codebase, especially in a monolithic one, is not easy. And more importantly, migrating to a monorepo can be very disruptive for the developers of that codebase. For instance, it requires moving most files into subdirectories, which causes conflicts with other changes currently being made by the team.

Let’s discuss ways to smoothly turn a monolithic Node.js codebase into a Monorepo, while minimizing disruptions and risks.

Introducing: a monolithic codebase

Let’s consider a repository that contains two Node.js API servers: `api-server` and `back-for-front-server`. They are written in TypeScript and transpiled into JavaScript for their execution in production. These servers share a common set of development tools (for checking, testing, building and deploying servers) and npm dependencies. They are also bundled together using a common Dockerfile, and the API server to run is selected by specifying a different entrypoint.

File structure - before migrating:

├─ .github
│  └─ workflows
│     └─ ci.yml
├─ .yarn
│  └─ ...
├─ node_modules
│  └─ ...
├─ scripts
│  ├─ e2e-tests
│  │  └─ e2e-test-setup.sh
│  └─ ...
├─ src
│  ├─ api-server
│  │  └─ ...
│  ├─ back-for-front-server
│  │  └─ ...
│  └─ common-utils
│     └─ ...
├─ .dockerignore
├─ .eslintrc.js
├─ .prettierrc.js
├─ .yarnrc.yml
├─ docker-compose.yml
├─ Dockerfile
├─ package.json
├─ README.md
├─ tsconfig.json
└─ yarn.lock

(Simplified) Dockerfile - before migrating:

FROM node:16.16-alpine
WORKDIR /backend
COPY . .
COPY .yarnrc.yml .
COPY .yarn/releases/ .yarn/releases/
RUN yarn install
RUN yarn build
RUN chown node /backend
USER node
CMD exec node dist/api-server/start.js

Having several servers maintained together in a shared repository presents several advantages:

  • the configuration of development tools (typescript, eslint, prettier…) and the deployment process are shared, so maintenance is reduced and the practices of all contributing teams remain aligned.
  • it’s easy for developers to reuse modules across servers, e.g. logging module, database client, wrappers to external APIs…
  • versioning is simple because there is just one shared range of versions used by all servers, i.e. any update on any server results in a new version of the Docker image, which includes all servers.
  • it’s also easy to write end-to-end tests that cover more than one server, and include them in the repository, because everything is in the same place.

Unfortunately, the source code of these servers is monolithic. What we mean is that there is no separation between the code of each server. Code that was written for one of them (e.g. SQL adapters), ends up being imported by other servers too. Hence it’s complicated to prevent a change on the code of server A from also impacting server B. Which can result in unexpected regressions, and code that becomes more and more coupled over time, making it more fragile and harder to maintain.

The « monorepo » structure is an interesting compromise: sharing a repository while splitting the codebase into packages. This separation makes the interfaces more explicit, and therefore allows to make conscious choices about dependencies between packages. It also enables several workflow optimisations, e.g. building and running tests only on packages that changed.

Migrating a monolithic codebase into a monorepo quickly gets difficult and iterative if the codebase is large, integrated with a lot of tooling (e.g. linting, transpilation, bundling, automated testing, continuous integration, docker-based deployments…). Also, because of the structural changes necessary in the repository, migrating will cause conflicts with any git branches that are worked on during the migration. Let’s overview the necessary steps to turn our codebase into a monorepo, while keeping disruptions to a minimum.

Overview of changes to make

Migrating our codebase to a monorepo consists of the following steps:

  • File structure: initially, we have to create a unique package that contains our whole source code, so all files will be moved.
  • Configuration of Node.js’ module resolution: we will use Yarn Workspaces to allow packages to import one another.
  • Configuration of the Node.js project and dependencies: package.json (including npm/yarn scripts) will be split: the main one at the root directory, plus one per package.
  • Configuration of development tools: tsconfig.json, .eslintrc.js, .prettierrc.js and jest.config.js will also be split into two: a “base” one, and one that will extend it, for each package.
  • Configuration of our continuous integration workflow: .github/workflows/ci.yml will need several adjustments, e.g. to make sure that steps are run for each package, and that metrics (e.g. test coverage) are consolidated across packages.
  • Configuration of our building and deployment process: Dockerfile can be optimized to only include the files and dependencies required by the server being built.
  • Configuration of cross-package scripts: use of Turborepo to orchestrate the execution of npm scripts that impact several packages. (e.g. build, test, lint…)

File structure - after migrating:

├─ .github
│  └─ workflows
│     └─ ci.yml
├─ .yarn
│  └─ ...
├─ node_modules
│  └─ ...
├─ packages
│  └─ common-utils
│     └─ src
│        └─ ...
├─ servers
│  └─ monolith
│     ├─ src
│     │  ├─ api-server
│     │  │  └─ ...
│     │  └─ back-for-front-server
│     │     └─ ...
│     ├─ scripts
│     │  ├─ e2e-tests
│     │  │  └─ e2e-test-setup.sh
│     │  └─ ...
│     ├─ .eslintrc.js
│     ├─ .prettierrc.js
│     ├─ package.json
│     └─ tsconfig.json
├─ .dockerignore
├─ .yarnrc.yml
├─ docker-compose.yml
├─ Dockerfile
├─ package.json
├─ README.md
├─ turbo.json
└─ yarn.lock

The flexibility of Node.js and its ecosystem of tools makes it complicated to share a one-size-fits-all recipe, so keep in mind that a lot of fine-tuning iterations will be required to keep the developer experience at least as good as it was before migrating.

Planning for low team disruption

Fortunately, despite the fact that fine-tuning iterations may take several weeks to get right, the most disruptive step is the first one: changing the file structure.

If your team uses git branches to work concurrently on the source code, that step will cause these branches to conflict, making them very complicated to resolve and merge to the repository’s main branch.

So our recommendation is threefold, especially if the entire team needs convincing and/or reassuring about migrating to a monorepo:

  • Plan a (short) code freeze in advance: define a date and time when all branches must have been merged, in order to run the migration while preventing conflicts. Plan it ahead so developers can accommodate. But don’t pick the date until you have a working migration plan.
  • Write the most critical parts of the migration plan as a bash script, so you can make sure that development tools work before and after migrating, including on the continuous integration pipeline. This should reassure the skeptics, and give more flexibility on the actual date and time of the code freeze.
  • With the help of your team, list all the tools, commands and workflows (including features of your IDE such as code navigation, linting and autocompletion) that they need to do their everyday work properly. This list of requirements (or acceptance criteria) will help us check our progress on migrating the developer experience over to the monorepo setup. It will help us make sure that we don’t forget to migrate anything important.

Here’s the list of requirements we decided to comply with:

  • yarn install still installs dependencies
  • all automated tests still run and pass
  • yarn lint still finds coding style violations, if any
  • eslint errors (if any) are still reported in our IDE
  • prettier still reformats files when saving in our IDE
  • our IDE still finds broken imports and/or violations, if any, of TypeScript rules expressed in tsconfig.json files
  • our IDE still suggests the right module to import, when using an symbol exposed by an internal package, given it was declared as a dependency
  • the resulting Docker image still starts and works as expected, when deployed
  • the resulting Docker image still has the same size (approximately)
  • the whole CI workflow passes, and does not take more time to run
  • our 3rd-party code analysis integrations (sonarcloud) still work as expected

Here’s an example of migration script:

# This script turns the repository into a monorepo,
# using Yarn Workspaces and Turborepo

set -e -o pipefail # stop in case of error, including for piped commands

NEW_MONOLITH_DIR="servers/monolith" # path of our first workspace: "monolith"

# Clean up temporary directories, i.e. the ones that are not stored in git
rm -rf ${NEW_MONOLITH_DIR} dist

# Create the target directory
mkdir -p ${NEW_MONOLITH_DIR}

# Move files and directories from root to the ${NEW_MONOLITH_DIR} directory,
# ... except the ones tied to Yarn and to Docker (for now)
mv -f \
    .eslintrc.js \
    .prettierrc.js\
    README.md \
    package.json \
    src \
    scripts \
    tsconfig.json \
    ${NEW_MONOLITH_DIR}

# Copy new files to root level
cp -a migration-files/. . # includes turbo.json, package.json, Dockerfile,
                          # and servers/monolith/tsconfig.json

# Update paths
sed -i.bak 's,docker\-compose\.yml,\.\./\.\./docker\-compose\.yml,g' \
  ${NEW_MONOLITH_DIR}/scripts/e2e-tests/e2e-test-setup.sh
find . -name "*.bak" -type f -delete # delete .bak files created by sed

unset CI # to let yarn modify the yarn.lock file, when script is run on CI
yarn add --dev turbo  # installs Turborepo
rm -rf migration-files/
echo "✅ You can now delete this script"

We add a job to our continuous integration workflow (GitHub Actions), to check that our requirements (e.g. tests and other usual yarn scripts) are still working after applying the migration:

jobs:
  monorepo-migration:
    timeout-minutes: 15
    name: Test Monorepo migration
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: ./migrate-to-monorepo.sh
        env:
          YARN_ENABLE_IMMUTABLE_INSTALLS: "false" # let yarn.lock change
      - run: yarn lint
      - run: yarn test:unit
      - run: docker build --tag "backend"
      - run: yarn test:e2e

 

Turn the monolith’s source code into a first package

Let’s see how our single package.json looks life, before migrating:

{
  "name": "backend",
  "version": "0.0.0",
  "private": true,
  "scripts": {
    /* all npm/yarn scripts ... */
  },
  "dependencies": {
    /* all runtime dependencies ... */
  },
  "devDependencies": {
    /* all development dependencies ... */
  }
}

And an excerpt of the tsconfig.json file used to configure TypeScript, still before migrating:

{
    "compilerOptions": {
        "target": "es2020",
        "module": "commonjs",
        "lib": ["es2020"],
        "moduleResolution": "node",
        "esModuleInterop": true,
        /* ... and several rules to make TypeScript more strict */
    },
    "include": ["src/**/*.ts"],
    "exclude": ["node_modules", "dist", "migration-files"]
}

When splitting a monolith into packages, we have to:

  • tell our package manager (yarn, in our case) that our codebase contains multiple packages;
  • and to be more explicit about where these packages can be found.

To allow packages to be imported as dependencies of other packages (a.k.a. workspaces), we recommend using Yarn 3 or another package manager that supports workspaces.

So we added "packageManager": "yarn@3.2.0" to package.json, and created a .yarnrc.yml file next to it:

nodeLinker: node-modules
yarnPath: .yarn/releases/yarn-3.2.0.cjs

As suggested in Yarn’s migration path:

  • we commit the .yarn/releases/yarn-3.2.0.cjs file;
  • and we stick to using node_modules directories, at least for now.

After moving the monolith codebase (including package.json and tsconfig.json) to servers/monolith/, we create a new package.json file at the root project directory, which workspaces property lists where workspaces can be found:

{
  "name": "@myorg/backend",
  "version": "0.0.0",
  "private": true,
  "packageManager": "yarn@3.2.0",
  "workspaces": [
    "servers/*"
  ]
}

From now on, each workspace must have its own package.json file, to specify its package name and dependencies.

So far, the only workspace we have is “monolith”. We make it clear that it’s now a Yarn workspace by prefixing its name with our organization’s scope, in servers/monolith/package.json:

{
  "name": "@myorg/monolith",
  /* ... */
}

After running yarn install and fixing a few paths:

  • yarn build and other npm scripts (when run from servers/monolith/) should still work;
  • the Dockerfile should still produce a working build;
  • all CI checks should still pass.

Extracting a first package: common-utils

So far, we have a monorepo that defines only one “monolith” workspace. Its presence in the servers directory conveys that its modules are not meant to be imported by other workspaces.

Let’s define a package that can be imported by those servers. To better convey this difference, we introduce a packages directory, next to the servers directory. The common-utils directory (from servers/monolith/common-utils) is a good first candidate to be extracted into a package, because its modules are used by several servers from the “monolith” workspace. When we reach the point where each server is defined in its own workspace, the common-utils package will be declared as a dependency of both servers.

For now, we move the common-utils directory from servers/monolith/, to our new packages/ directory.

To turn it into a package, we create the packages/common-utils/package.json file, with its required dependencies and build script(s):

{
  "name": "@myorg/common-utils",
  "version": "0.0.0",
  "private": true,
  "scripts": {
    "build": "swc src --out-dir dist --config module.type=commonjs --config env.targets.node=16",
    /* other scripts ... */
  },
  "dependencies": {
    /* dependencies of common-utils ... */
  },
}

Note: we use swc to transpile TypeScript into JavaScript, but it should work similarly with tsc. Also, we made sure that its configuration (using command-line arguments) is aligned to the one from servers/monolith/package.json.

Let’s make sure that the package builds as expected:

$ cd packages/common-utils/
$ yarn
$ yarn build
$ ls dist/ # should contain the .js build of all the files from src/

 

Then, we update the root package.json file to declare that all subdirectories of packages/ (including common-utils) are also workspaces:

{
  "name": "@myorg/backend",
  "version": "0.0.0",
  "private": true,
  "packageManager": "yarn@3.2.0",
  "workspaces": [
    "packages/*",
    "servers/*"
  ],
  /* ... */
}

And add common-utils as a dependency of our monolith server package:

$ yarn workspace @myorg/monolith add @myorg/common-utils

You may notice that Yarn created node_modules/@myorg/common-utils as a symbolic link to packages/common-utils/, where its source code is held.

After doing that, we must fix all broken imports to common-utils. A low-diff way to achieve that is to re-introduce a common-utils directory in servers/monolith/, with a file that export functions from our new @myorg/common-utils package:

export { hasOwnProperty } from "@myorg/common-utils/src/index"

Let’s not forget to update the servers’ Dockerfile, so the packages are built and included in the image:

# Build from project root, with:
# $ docker build -t backend -f servers/monolith/Dockerfile .

FROM node:16.16-alpine

WORKDIR /backend
COPY . .
COPY .yarnrc.yml .
COPY .yarn/releases/ .yarn/releases/
RUN yarn install

WORKDIR /backend/packages/common-utils
RUN yarn build

WORKDIR /backend/servers/monolith
RUN yarn build

WORKDIR /backend
RUN chown node /backend
USER node
CMD exec node servers/monolith/dist/api-server/start.js

This Dockerfile must be built from the root directory, so it can access the yarn environment and files that are there.

Note: you can strip development dependencies from the Docker image by replacing yarn install by yarn workspaces focus --production in the Dockerfile, thanks to the plugin-workspace-tools plugin, as explained in Orchestrating and dockerizing a monorepo with Yarn 3 and Turborepo | by Ismayil Khayredinov | Jun, 2022 | Medium.

At this point, we have successfully extracted an importable package from our monolith, but:

  • the production build fails to run, because of Cannot find module errors;
  • and the import path to common-utils is more verbose than necessary.

Fix module resolution for development and production

The way we import functions from @myorg/types-helpers is problematic because Node.js looks from modules in the src/ subdirectory, even though they were transpiled into the dist/ subdirectory.

We would rather import functions in a way that is agnostic to the subdirectory:

import { hasOwnProperty } from "@myorg/common-utils"

If we specify "main": "src/index.ts" in the package.json file of that package, the path would still break when running the transpiled build.

Let’s use Node’s Conditional Exports to the rescue, so the package’s entrypoint adapts to the runtime context:

 {
    "name": "@myorg/common-utils",
    "main": "src/index.ts",
+   "exports": {
+     ".": {
+       "transpiled": "./dist/index.js",
+       "default": "./src/index.ts"
+     }
+   },
    /* ... */
  }

 

In a nutshell, we add an exports entry that associates two entrypoints to the package’s root directory:

  • the default condition specifies ./src/index.ts as the package’s entrypoint;
  • the transpiled condition specifies ./dist/index.js as the package’s entrypoint.

As specified in Node’s documentation, the default condition should always come last in that list. The transpiled condition is custom, so you can give it the name you want.

For this package to work in a transpiled runtime context, we change the corresponding node commands to specify the custom condition. For instance, in our Dockerfile:

- CMD exec node servers/monolith/dist/api-server/start.js
+ CMD exec node --conditions=transpiled servers/monolith/dist/api-server/start.js

Make sure that development workflows work as before

At this point, we have a monorepo made of two workspaces that can import modules from one another, build and run.

But it still requires us to update our Dockerfile everytime a workspace is added, because the yarn build command must be run manually for each workspace.

That’s where a monorepo orchestrator like Turborepo comes in handy: we can ask it to build packages recursively, based on declared dependencies.

After adding Turborepo as a development dependency of the monorepo (command: $ yarn add turbo --dev), we can define a build pipeline in turbo.json:

{
    "pipeline": {
        "build": {
            "dependsOn": ["^build"]
        }
    }
}

This pipeline definition means that, for any package, $ yarn turbo build will start by building the packages it depends on, recursively.

This allows us to simplify our Dockerfile:

# Build from project root, with:
# $ docker build -t backend -f servers/monolith/Dockerfile .

FROM node:16.16-alpine
WORKDIR /backend
COPY . .
COPY .yarnrc.yml .
COPY .yarn/releases/ .yarn/releases/
RUN yarn install
RUN yarn turbo build # builds packages recursively
RUN chown node /backend
USER node
CMD exec node --conditions=transpiled servers/monolith/dist/api-server/start.js

Note: it’s possible to optimize the build time and size by using Docker stages and turbo prune, but the resulting yarn.lock file was not compatible with Yarn 3, when this article was being written. (see this pull request for recent progress on this issue)

Thanks to Turborepo, we can also run unit tests of all our packages, in one command: yarn turbo test:unit, after defining a pipeline for it, like we did for build.

That said, most developer workflows rely on dependencies and configuration files that were moved to servers/monolith/, so most of them don’t work anymore.

We could leave these dependencies and files at the root level, so they are shared across all packages. Or worse: duplicate them in every package. There is a better way.

Extract and extend common configuration as packages

Now that our most critical build and development workflows work, let’s make our test runner, linter and formatter work consistently across packages, while leaving room for customization.

One way to achieve that is to create packages that hold base configuration, and let other packages extend them.

Similarly to what we did for common-tools, let’s create the following packages:

├─ packages
│  ├─ config-eslint
│  │  ├─ .eslintrc.js
│  │  └─ package.json
│  ├─ config-jest
│  │  ├─ jest.config.js
│  │  └─ package.json
│  ├─ config-prettier
│  │  ├─ .prettierrc.js
│  │  └─ package.json
│  └─ config-typescript
│     ├─ package.json
│     └─ tsconfig.json
├─ ...

Then, in each package that contains source code, we add those as dependencies, and create configuration files that extend them:

packages/*/.eslintrc.js:

module.exports = {
    extends: ["@myorg/config-eslint/.eslintrc"],
    /* ... */
}

packages/*/jest.config.js:

module.exports = {
    ...require("@myorg/config-jest/jest.config"),
    /* ... */
}

packages/*/.prettierrc.js:

module.exports = {
    ...require("@myorg/config-prettier/.prettierrc.js"),
    /* ... */
}

packages/*/tsconfig.json:

{
    "extends": "@myorg/config-typescript/tsconfig.json",
    "compilerOptions": {
        "baseUrl": ".",
        "outDir": "dist",
        "rootDir": "."
    },
    "include": ["src/**/*.ts"],
    /* ... */
}

To make it easier and quicker to set up new packages with these configuration files, feel free to use a boilerplate generator, e.g. plop.

Next step: one package per server

Now that we have checked off all the requirements listed in the “Planning for low team disruption” section, it’s a good time to actually freeze code contributions, run the migration script, then commit the changes to the source code repository.

From now on, the repository can officially be referred to as “monorepo”! All developers should be able to create their own packages, and to import them from the monolith, instead of adding new code directly into it. And the foundations are solid enough to start splitting the monolith into packages, like we did for common-tools.

We are not going to cover precise steps on how to achieve that, but here are some recommendations on how to prepare for that splitting:

  • start by extracting small utility packages, e.g. type libraries, logging, error reporting, API wrappers, etc…
  • then, extract other parts of the code that are meant to be shared across all servers;
  • finally, duplicate the parts that are not meant to be shared, but are still relied upon by more than one server.

The goal of these recommendations is to decouple servers from each other, progressively. Once this is done, extracting one package per server should be almost as simple as extracting common-utils.

Also, during that process, you should be able to optimize the duration of several build, development and deployment workflows, by leveraging:

Conclusion

We have turned a monolithic Node.js backend into a Monorepo while keeping team disruptions and risks to a minimum:

  • to split the monolith into multiple decoupled packages that can depend on each other;
  • sharing common TypeScript, ESLint, Prettier and Jest configuration across packages;
  • and setting up Turborepo to optimize development and build workflows.

Using a migration script allowed us to avoid code freeze and git conflicts while preparing and testing the migration. We made sure that the migration script did not break the build and development tools by adding a CI job.

I would like to thank Renaud Chaput (Co-Founder, CTO at Notos), Vivien Nolot (Software Engineer at Choose) and Alexis Le Texier (Software Engineer at Choose) for their collaboration on this migration.

About the Author

Rate this Article

Adoption
Style

BT