BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles In Defence of the Monolith, Part 2

In Defence of the Monolith, Part 2

Bookmarks

Key Takeaways

  • The requirements for scalability and to handle intrinsic domain complexity are two important forces to consider when selecting an architecture for a system. 
  • Modular monoliths can suffer from JAR hell, but build tools can help tame the problem. 
  • Modules within monoliths (like microservices) should handle their own data, but a naïve mapping of modules to an RDBMS will result in a database that’s hard to maintain. A number of patterns can help keep things under control. 
  • For a modular monolith, the underlying technical platform should handle as many cross-cutting concerns as possible, leaving the developer to concentrate on the complexities of the business domain. Apache Isis is one such technical platform particularly suited to this task, enabling the hexagonal architecture and implementing the naked objects pattern. 
  • The open source Estatio application (built on Apache Isis) is a good example of a modular monolith. Use it to help gauge whether your own context might be best served by a monolith (or “monolith first”) approach.

In part 1 of this article, we explored the pros and cons of monoliths – or more precisely modular monoliths - as compared to the microservices architecture.   Along the way, we discussed maintainability, transactionality, complexity, scalability, flexibility of implementation and developer productivity.

What we concluded from that discussion was that the architecture you should choose depends, of course, on context.  Two of the most important considerations are shown in figure 1.

Figure 1: Scalability vs Domain Complexity

If your domain is (relatively) simple but you need to achieve “internet-scale” volumes, then a microservices architecture may well suit.  You must be confident enough in the domain to decide up-front the responsibilities and interfaces of each microservice.

If your domain is complex but the expected volumes are bounded (e.g. for use just within an enterprise), then a modular monolith makes more sense.  A monolith will let you more easily refactor the responsibilities of the modules as your understanding of the domain deepens over time.

And for the tricky high complexity/high volume quadrant, I would argue that it’s wrong to optimize for scalability first.  Instead, build a modular monolith to tackle the domain complexity, then refactor to a microservices architecture as and when higher volumes are achieved.  This approach also lets you defer the higher implementation costs of a microservices architecture until such time that your volumes (and presumably revenue) justify the business case to spend the extra money.

Implementing a microservices architecture correctly can be challenging, but building a modular monolith also needs to be tackled thoughtfully.  In part 1, we identified a number of potential issues:

  • A modular monolith must consist of, well, modules.  However, this can result in accidental cyclic dependencies.  It can also give rise to JAR hell, which we’ll explore here in part 2.
  • While every module should be responsible for its own data, monoliths can “tactically” exploit the fact that many modules may persist to the same, single, transactional data store.  Care is needed though to ensure the resultant database doesn’t become a “big ball of mud”.
  • Guaranteed synchronous calls between modules can provide a better user experience.  However, these modules must be decoupled to allow them to evolve independently.  Slowly evolving modules should not depend on modules that are often changed.
  • In order to allow the development team to stay focused on the domain, a platform/framework is required to handle as many cross-cutting concerns as possible.  Even so, it’s still rather common for business logic to “leak” from the domain layer into the adjacent presentation or persistence layers.

Here in part 2 of the article, we’re going to explore how to tackle these issues, and we’ll look at an example of a real-world modular monolith on the JVM that leverages a powerful open source framework to manage cross-cutting concerns.

Acyclic Dependencies and JAR hell

With a modular monolith, we need some way to delineate the boundaries of each module.

Our first option is to use language features – such as packages (Java) or namespaces (.NET) – to group together the module’s functionality, but it isn’t otherwise distinguished from the rest of the application.  There are however no guarantees that there won’t be cycles between those packages/namespaces; if you only use this option, you’re very likely to end up with a non-modular monolith, a big ball of mud.

Instead, we need a bit more structure, allowing build tools to enforce the acyclic dependencies we require between those modules.  Implementing this on the Java platform could be done using a Maven multi-module project; for .NET it would be a single Visual Studio solution with multiple C# or F# projects within.  All this code is recompiled together, but the build tooling (Maven or Visual Studio) will ensure that there are no cyclic dependencies between those modules.

One downside with this second option is that, because all the code is held in a single code repo and is all (re)compiled together, it also must all be (re)tested and it all gets the same version number.  This option doesn’t exploit the fact that, in reality, different modules evolve at different speeds.  Why continually rebuild/retest code that changes only slowly over time?

A third option is therefore to move modules out into their own code repos, and version each separately.  On the .NET platform, we can package each module up as a NuGet package, while on Java we can package as Maven modules.  From the context of the main application that consumes them, these modules are indistinguishable from a third-party dependency.

However, this is also where we need to take care because it’s possible to end up with cyclic dependencies.  For example, suppose that a customers v1.0 module depends upon an addresses v1.0 module.  If a developer creates a new version addresses v1.1 that references customers v1.0, then we seemingly have the customers and addresses modules mutually dependent upon each other; a cyclic dependency.  This is, of course, a Bad Thing™.

To solve this, we need to decide which direction the dependencies are meant to flow in: is customers module meant to depend on the addresses, or vice versa?  The heuristic here is the stable dependencies principle: unstable (frequently changing) modules should depend on stable (infrequently changing) modules.  In our example, the question becomes: which concepts are more volatile: customers or addresses?  If the direction of the dependency is incorrect, then the dependency inversion principle can be used to refactor.

Figuring this out can be quite straightforward.  Some modules may just hold reference data, for example tax rate tables or currency.  Other modules that are almost but not quite reference data include counterparties, and fixedassets, or maybe (financial) instruments.  Another good example is “filing-cabinets” which just store stuff: for example, documents or communications.  In all these cases, other modules will depend on these modules, not the other way around.

We could also take a more scientific approach and turn to our version control history, measuring the relative amount of churn in each module.

Modules that are stable are good candidates to move out of the application’s code repository and into their own repositories.  And once you have moved out modules into their own repo, then they can start being reused in other applications too.

Actually, all we require is that the interface defined by a module is stable.  Whether or not the implementation behind the interface is stable is unimportant.  In fact, it can be a good move to also move modules out whose implementation is still in flux, because it removes some of the code churn from the main repo.  Exploiting this fact does though require that the module’s interface is formally, and not implicitly, defined.

The above is all well and good, but what we also need is an early warning when a cyclic dependency does accidentally get introduced, ideally within our build or CI.  This is achievable.

Let’s go back to the example above: customers v1.0 à addresses v1.0 while addresses v1.1 à customers v1.0.  The application itself will link to the latest version of each module, which gives us customers v1.0 and addresses v1.1 in a cyclic dependency.

This is a dependency convergence problem, more commonly called “JAR (or DLL) hell”.  Figure 2 shows a more common example, where an application uses two libraries that in turn use conflicting versions of some common base library.

Figure 2: Dependency Convergence Conflicts

If running on the JVM, then this would manifest at runtime with linkage errors; under normal circumstances the JVM only loads one version of a class at a time.

To fix this, Maven’s Enforcer plugin can be configured to flag any dependency convergence issues, if necessary failing the build. The developer can then use <dependencyManagement> section within the pom.xml (or sometimes dependency <exclusions>) to decide which version of any given common library to run with.  The use of semantic versioning by open source libraries is increasingly common, so if the version difference is only minor (v2.3 vs v2.4) then most likely the higher version can be used without issue.

If using NuGet 3.x, then a similar effect can be achieved by virtue of the “Nearest wins” dependency resolution rule.

That said, some projects, such as Guava, release major versions quite regularly and do delete deprecated API; there’s a chance that it might not even be possible to run the monolith shown in figure 2.  In such a case, you must look to fix that dependency conflict by updating it.  If that’s not an option, you might be able to shade (repackage) the dependency.  If those aren’t options for you, you’ll just have to rework your code somehow to remove the conflict or maybe even the dependency.

For the sake of completeness, we should note that OSGi applications (on the JVM) avoid this problem because each module chain (bundle in OSGi parlance) can be arranged to load in a different classloader.  However, while OSGi has its fans, it’s the exception rather than the rule, and may well lose ground when Java 9 ships with the Jigsaw module loading system.   Jigsaw is no silver bullet though: it very deliberately does not attempt to tackle the dependency convergence issue, instead leaving it as a problem for build tools such as Maven to handle.

To summarize: (on the JVM at least) use Maven’s Enforcer plugin to enforce dependency convergence issues, and where there are conflicts, then clearly handle them with <dependencyManagement> sections and if necessary <exclusions>.  Keep these under close review – I’ve started putting mine into an always-active <profile> called “resolving-conflicts” so they are more obvious – and always be looking to reduce these exceptions over time.

Data

Just as in a microservices architecture, in a modular monolith, each module is responsible for persisting its own data.  In most cases, these modules will all be using a relational database to store their entities: relational databases still (rightly) rule the roost for many enterprise webapps.  This then provides the “tactical” opportunity to co-locate those tables on a single RDBMS, and thus take advantage of transactions.

In terms of mapping entities in a module to an RDBMS, since each module will have its own namespace/package, this should be reflected in terms of the schema names of the tables (to which the entities within those modules are mapped).  The module/schema should also be used as the value of any discriminator columns for super-type tables (i.e. mapping inheritance hierarchies).

One of the key differences between a domain object model and a relational database is the means by which relationships between entities are represented; in memory, there’s an object pointer, whereas in the database there’s a foreign key attribute.  As figure 3 shows, a naïve mapping of the classes (on the left) to the tables (on the right) can result in the direction of dependencies in effect being the opposite in the database to that of the code.

Figure 3: Class vs Table Relationships

The places that hold the Customer entity are both the Customers table, and also the Addresses.customer_id column (because that foreign key corresponds to the Customer.addresses field).  Even if the codebase is nicely organized as a set of layered modules with acyclic dependencies, when we look at the RDBMS we have our big ball of mud.

The problem can be fixed though.  To keep all the Customer information in the same schema, we should move the foreign key out of the Addresses table and into a link table, as shown in figure 4.  The performance hit will be negligible.

Figure 4: Link table

I would argue that relationships for the tables of entities within the same module don’t need this treatment... but I also wouldn’t argue too hard against you if you wanted to always introduce a link table for all associations.

More involved are polymorphic associations between objects.  For example, we might want to be able to attach Documents to all domain objects.  As shown in figure 5, we can introduce the concept of Paperclip (an interface) and use concrete implementations to act as the link table.

(Click on the image to enlarge it)

Figure 5: Polymorphic associations

Each individual Paperclip will be mapped to two tables, one in the documents schema, and one in the schema specific to its implementation, for example PaperclipsForCustomer.  The Paperclips.discriminator column indicates the concrete subtype.

What’s nice about this mapping is we can still leverage referential integrity between all the tables in the database, while in the code we have a natural use of the Paperclip interface.

The patterns described above show that there are techniques to tackle structural decoupling of the database, but this doesn’t necessarily address behavioural coupling.  In part 1, we identified the problem that a developer working in module A could write a SELECT statement directly querying the tables owned by module B.  How should this be tackled?

The solution used on the monoliths I work on is to make the ORM the way in which database interactions are performed; ad-hoc SELECT statements are verboten.  On the .NET monolith I work on, we use Entity Framework, and each module corresponds to a separate DB Context.  This also handles structural issues; EF only manages foreign keys within the module/DB Context, and we use the polymorphic link pattern described above to handle relationships between modules.  For the Java monolith, we use DataNucleus (which implements JDO and JPA APIs); again, each module has its own persistence context.

You may well ask: what of those use cases where an ORM doesn’t work?  The glib answer is that it’s worth investing the time learning to use the ORM effectively: chances are that it does work, actually.  That said, in both monoliths, we handle special cases – typically where large volumes of data are required from two or more modules - using views which JOIN the tables from the relevant modules.  The ORM neither knows nor cares that the entity is mapped to a view rather than a table.  This is a performance optimization: the view effectively co-locates the business processing with the data.   The view definitions are also trackable as code artefacts in their own right: we can see where we’ve deliberately chosen to subvert module boundaries in order to meet some user goal.

Transactionality (& synchronicity)

It’s common for a business operation to result in a change of state in two or more modules.  For example, consider an invoicing application where we want to perform an invoice run.  This will mostly modify state only in the invoicing module, creating new Invoice and InvoiceItem objects.  However, if some customers want their invoices to be sent out by email, then it might as a side-effect create Document objects (in the documents module), and Communication objects (in the communications module). 

In a microservice architecture we have no transactions across services, which in general means we must use messages to coordinate such changes.  The system therefore has only eventual consistency, and compensating actions are used to “back out” the change if something goes wrong.  In some systems, this eventually-consistent behaviour can be confusing to the end-user, and to the developer too.  For example, in the CQRS pattern that separates out writes from reads, a change written against one service will not immediately be available to read from another.

For a monolith though, if the backing data stores for the invoicing, documents and communications modules are all co-located in the same RDBMS, then we can simply rely on the RDBMS transaction to ensure that all the state is changed atomically.  From an end-user perspective, everything remains consistent; there are no potentially confusing interim states or compensating actions to worry about.  For the developer, they can expect that writes written to the database will be there to read immediately.

Synchronous behaviour can improve the user experience in other ways too. Imagine that each Customer has a collection of associated EmailAddresses, and that one of these EmailAddresses is nominated as the one to send invoices to.   Suppose now that the end-user wants to delete that particular EmailAddress.  In this case, we want the invoicing module to veto the deletion, because that email address is “in use”.  Basically, we want to enforce a referential integrity constraint across modules.

While supporting this use case in a microservice can be complicated, in a monolith we can easily handle the requirement.  One design is to use an internal event bus, whereby the customer module broadcasts the intention to delete the EmailAddress, and allows subscribers in other co-located modules to veto the change:

public class Customer {
    ...
    @Action(domainEvent = EmailAddressDeletedEvent.class)
    public void delete(EmailAddress ea) {
        ...
    }
}

Listing 1: Customer action to delete email address, emitting an event

with a subscriber:

public class InvoicingSubscriptions {
    @Subscribe
    public void on(Customer.EmailAddressDeletedEvent ev) {
        EmailAddress ea = (EmailAddress)ev.getArg(0);
        if(inUse(ea)) {
            ev.veto(“Email address in use by invoicing”);
        }
    }
    ...
}

Listing 2: Invoicing subscriber of the delete email address event

The underlying technical platform would automatically emit the EmailAddressDeletedEvent onto the internal event bus, prior to invoking the delete.  The subscriber can, if required, veto this interaction for the provided email address, if it is in use.

A different, more explicit, design is for the customer module to declare a service provider interface (SPI) and then allow other modules to implement that SPI:

public class Customer {
    ...
    public void delete(EmailAddress ea) {
        ...
    }
    public String validateDelete(EmailAddress ea) {
        return advisors.stream()
                       .map(advisor -> advisor.cannotDelete(ea))
                       .filter(reason -> reason != null)
                       .findFirst().orElse(null);
    }

    public interface DeleteEmailAddressAdvisor {
        String cannotDelete(EmailAddress ea);
    }

    @Inject
    List<DeleteEmailAddressAdvisor> deleteAdvisors;
}

Listing 3: Customer action to delete email address, with validation and an “advisor” SPI

with an advisor class implementing the SPI:

public class Invoicing implements DeleteEmailAddressAdvisor {
    public void cannotDelete(EmailAddress ea) {
        if(inUse(ea)) {
            return “Email address in use by invoicing”;
        }
        return null;
    }
    ...
}

Listing 4: Invoicing module implementation of the “advisor” SPI

Here the validateDelete method is a guard called before the delete method; it is used to determine if the delete may be performed for this particular email address.  Its implementation iterates over all injected advisors; a non-null return value is interpreted as the reason that the EmailAddress cannot be deleted.

Here’s another use case. In figure 5 we saw how different modules might provide the ability to attach Documents to their respective entities by way of Paperclip implementations. One can imagine that the documents module might contribute an “attach” action that would allow Documents to be attached, but this action should only be made available in the UI for those entities for which a Paperclip implementation exists.  Again, the documents module could discover which entities expose the “attach” action either by emitting events on an internal event bus, or through an SPI service.

For example:

@Mixin
public class Object_attach {
    private final Object context;
    public Object_uploadDocument(Object ctx) { this.context = ctx; }

    public Object attach(Blob blob) {
        Document doc = asDocument(blob)
        paperclipFactory().attach(context, doc);
    }
    public boolean hideAttach() {
        return paperclipFactory() == null;
    }

    public interface PaperclipFactory {
        boolean canAttachTo(Object o)
        void attach(Object o, Document d);
    }
    PaperclipFactory paperclipFactory() {
        return paperclipFactories.stream()
                                 .filter(pf -> pf.canAttach(context))
                                 .findFirst().orElse(null);
    }

    @Inject
    List<PaperclipFactory> paperclipFactories;
}

Listing 5: Mixin to attach Documents to arbitrary objects

The idea here is that the Object_attach class acts like a mixin or trait, contributing the attach action to all objects.  However, (via the hide method) this action is not shown in the UI if there is no PaperclipFactory able to actually attach a document to the particular domain object acting as the context to the mixin.

Platform Choices

Whether you build yourself a monolith or a microservices system, you’ll need some sort of platform or framework on which to run it.

For microservice architectures the platform is mostly focused on the network: it needs to allow services to interact with each other (protocols, message encodings, sync/async, service discovery, circuit breakers, routers, etc.) and to be able to run up the system in its entirety (Docker Compose, etc.).  The language to implement any given individual service is less important, so long as it can be packaged, e.g. as a Docker container.  (Of course, the project team must have the appropriate skills in that language for initial development and ongoing maintenance/support).

For monoliths, too, a common platform is required, but here the focus is more on the language and supporting ecosystem.  At a very minimum this will be the technology platform such as Java or .NET.  On top of this you’ll probably also adopt some framework, JEE and Spring being common choices.

Because a monolith’s strength is dealing with complex domains, the underlying platform should pick up as many technical/cross-cutting concerns as possible: security, transactionality and persistence are the obvious ones (there are others, as we’ll see).  Moreover, business modules should not depend on the technical modules; we want to get as close to the hexagonal architecture as possible.

It’s also important for a monolith’s platform to provide tools allowing business modules to be decoupled from each other.  A solution to this for a monolith is remarkably similar to that of a microservice: use an event bus.  The difference is that with a monolith, this event bus is intra-process and is also transactional.

A (Modular) Monolith Example

To help make the case for a modular monolith, we end part 2 of this article with a real-world example.

The application in question is called Estatio, an invoicing system for Eurocommercial Properties, a real-estate company that owns and operates (at the time of writing) 34 shopping centres in three European countries.  The source code for Estatio can be found on GitHub.

(Click on the image to enlarge it)

Figure 6: Estatio Screenshot

The underlying technology platform/framework for Estatio is Apache Isis, a full-stack framework for the JVM that handles all the usual cross-cutting concerns such as security, transactionality and persistence.  However, it goes further than this in also automatically rendering domain objects either through a web UI or through a REST API, following the naked objects pattern.  In the same way that an ORM automatically maps/marshals a domain object into a persistence layer, you can think of Apache Isis as mapping that domain object into the presentation layer.

Because the UI is generic, it can be steadily improved/enhanced with no changes to the domain object model.  For example, in a previous release, the Apache Isis viewer was improved to use Bootstrap for styling.  Every application that updated to this release was then “magically upgraded” with the improved viewer.  When capabilities such as maps, calendars or Excel exports have been added, they too are rendered automatically in the UI everywhere that the framework can infer that they apply.

Because interactions to the business domain objects go “through” the generic UI provided by Apache Isis, then a whole bunch of other cross-cutting concerns can also be tackled.  For example, Apache Isis automatically creates a command memento (serializable to XML) for every action invocation or property edit, and this can then be published to an event bus such as Apache Camel as the transaction completes.  It also correlates this command with an audit trail, providing full cause-and-effect traceability of every change made to every domain object.

The framework works by building an internal metamodel (similar to how ORMs work), and this metamodel can be exploited for other purposes than just the generic UI and REST API.  For example, a Swagger interface file can be exported to allow custom UIs to be built against the REST API, while the powerful security module defines roles and permissions with respect to the properties and actions of the domain object types.  The metamodel is also used to generate gettext “.po” files to be translated for i18n.  It’s also possible to define metamodel validators to enforce architectural standards, for example: that every entity in a given module is mapped to the correct database schema.

With the framework handling so many of the technical concerns, the developer is able to focus on the domain, ensuring that it is properly modularized for long-term maintainability.  To help modules stay fully decoupled, the framework supports the concept of mixins, whereby the rendering of a given domain object can include state and behaviour from several modules without there actually being any coupling of the business modules themselves.   The ability to attach Documents to arbitrary objects is a good example; the code in listing 5 above is very similar to the Apache Isis programming model.

Equally important is the provision of an internal event bus.  Rather than have one module directly call another, it can just emit an event which other modules can then subscribe to.  The code listings 1 and 2 are once again examples of how Apache Isis supports this.

Persistence patterns such as support for polymorphic associations (figure 5) are also important.  These are implemented by various open source modules in the Incode Catalog to support generic subdomains such as documents, notes, aliases, classifications, and communications. 

A further extensive set of modules can be found at Isis Add-ons.  These tackle technical concerns such as security, auditing, and event publishing.  The extensions to the Apache Isis viewer (maps, calendars, PDF, etc.) are also to be found here.

To make both the generic business subdomains and technical add-ons easy to reuse, each is supported by its own demo app and integration tests.  The would-be consumer of these apps can therefore check them out easily to see if they fit requirements.

So much for Apache Isis and its supporting ecosystem; the proof of the pudding is in the eating.  What the technical platform should enable is the ability for the development team to concentrate on the core domain, with that domain broken up into modules.  And so, if you inspect the Estatio codebase you will indeed see that it consists of a number of separate modules.  Figure 7 shows how these depend on each other (diagram generated using Structure101).

(Click on the image to enlarge it)

Figure 7: Estatio Modules

In the diagram on the left-hand side of figure 7, each box represents a separate Maven module, and the lines represent dependencies between the modules.

Towards the bottom are utility modules (domsettings, numerator) or modules that contain strictly reference data (country, currency, index, tax, charge). 

Moving into the middle we see the agreement, party, financial, asset, assetfinancial and bankmandate modules: neither the structure of these modules nor the data within them changes that often.  By the time we get to budgeting, invoice and in particular lease, we are at the heart of the system; these are the modules that depend most on the other submodules.

The diagram on the right-hand side of figure 7 is almost the same, however the lease module has been expanded into its sub-packages.  Here we can start to see some bidirectional dependencies, suggesting that this code could perhaps be improved.  There are certainly a lot of outbound dependencies, so the module is probably doing too much.  No software is perfect.  Then again, while lease is the largest module in the system, it’s still conceptually small enough for us to work on (“a lease is an agreement between two parties – a tenant and landlord – that calculates invoices”).

Estatio is now almost 5 years old as an application, with its scope set to continue to expand to support further use cases.  But its code base may shrink even as its scope expands: the majority of the modules in Isis Add-ons and Incode Catalog were factored out of Estatio, and we expect to factor out further modules in the future.  And if you cloned its repo today to take a look, you might find it has moved on from the above diagrams.  That’s to be expected; this software is intended to have a long-shelf life, and will continue to evolve.

Conclusions

In part 1 of this article we compared the modular monolith with the microservices architectures, exploring the benefits and weaknesses of both. 

We also asked the question: “which architecture should you go for, microservices or monoliths?”  And we answered by asking a different question: “what is it you are trying to optimise for?”  If on balance you’ve decided that the risk of domain complexity outweighs the risk of not being able to scale, then you should have decided to implement a modular monolith.  Hopefully the various techniques and patterns we’ve described here in part 2 will assist.

Technical platforms are important whatever the architecture; there’s no point in reinventing the wheel.  A framework such as Apache Isis will allow you to channel your energies into tackling the complexities of the domain, helping you explore the module boundaries, while mopping up almost all of the technical cross-cutting concerns (including the presentation layer). 

We also looked at a substantial open source application, Estatio, that uses Apache Isis as its underlying platform, showing what a modular monolith looks like “in the flesh”.

Neither monoliths nor microservices is a silver bullet; the answer to “which should I go for?” is always “it depends”, and anyone who tells you otherwise is selling you snake oil.  Consider where your system fits with respect to scalability vs. domain complexity, and take it from there.

About the Author

Dan Haywood is an independent consultant most known for his work on domain-driven design and the naked objects pattern. He is a committer for Apache Isis, a Java framework for building backend line-of-business applications, and which implements the naked objects pattern. Dan has a 13+ year ongoing involvement as technical advisor for the Irish Government's strategic Naked Objects system on .NET, now used to administer the majority of the department's social welfare benefits. He also has 5 years ongoing involvement with Eurocommercial Properties co-developing Estatio, an open source estate management application, implemented on Apache Isis. You can follow Dan on Twitter and on his Github profile.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Data ownership

    by Sam Siddiqi,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    What a wonderful set of articles!

    On thing that has always given me grief is dealing with data dependencies that span modules.

    For example, with respect to the Customer and Addresses relationship, it's hard ascribing the persistence logic to just one module. In this particular example, the ORM would probably cascade a save applied to the Customer object to the Address, but that's technically spanning modules...

    It also means that there is a data coupling between the module containing Customer and the one containing Address. In heavily interconnected object models, this kind of coupling pretty much means one module (transitively) depends on a whole swath of others just to satisfy these sort of dependencies. Independent deployability of modules seems to be defeated with this manner of coupling.

    I've dealt with the problem by adopting DDD patterns pertaining to aggregate structures, where I have aggregate structures owned by a single module, and references to aggregate structures in different modules done through non-ORM mapped aggregate-root sysids/UUIDs. This dilutes the value of the ORM, (especially those like Hibernate which implement the Unit of Work pattern), but it gives a degree of isolation to the modules.

    I'd love to hear your thoughts on the matter...

    Cheers!

  • Re: Data ownership

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Sam,

    Glad you've enjoyed the articles.

    It's certainly common in DDD circles to use sysids/UUIDs as a way of associating aggregates. However, this moves all the responsibility for maintaining referential integrity into the application code. While that's unavoidable for microservice architectures that rely on eventual consistency, for modular monoliths it's always seemed to me to be casually discarding one of the main strengths of an RDBMS: a case of throwing out the baby with the bathwater.

    But the problem, as you point out, is that using an ORM to naively declaring associations between entities in different module can result in the the database schema becoming an unmanageable ball of mud.

    The two patterns I've described in the "Data" section of this article go a long way to addressing this issue. The polymorphic association pattern in particular (the "paperclip" example) lets us have RBDMS-enforced referential integrity across modules, without declaring that association to the ORM. Instead, the association itself is an entity, whose contract is defined by one module but implemented by another. The use of an event bus or SPI services let the code in the modules manage these links (in a decoupled fashion), but still lets the RDBMS ensure there is referential integrity between the underlying tables.

    One thing I always try to remember: the data in enterprise systems almost always outlives the application code that consumes it. Even though the enterprise apps I work on are intended to last for decades, they will eventually be replaced. We should do our utmost therefore to ensure that the underlying data is as "clean" as possible. RDBMSs ability to enforce type, entity and referential integrity constraints are powerful tools to enable this.

    Cheers
    Dan

  • Thanks

    by BA Papa Alassane,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Excellent Article !!
    We are working on a financial solution we plan to offer in white label to clients and we plan to use this modular monolithic approach to be able to make custom builds for each client according to the services he subscribed. We've started a POC and the main issue we have right now is the boundary of the modules on the model layer. but we are dealing with this, we often merge some modules to avoid dependency issues. and usually it makes sense.
    Thanks again for the aticle. Excuse my english

  • Re: Thanks

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Glad you enjoyed the article.

    The best way to ensure you have the boundaries right between modules is to validate the POC with real-world usage. Establishing a feedback loop is vital: no plan survives contact with the enemy.

    As well as the paying attention to the boundaries between modules within the domain layer, do also pay attention to the boundaries between the domain layer and the UI/presentation layer. If you aren't using a framework such as Apache Isis, then you'll need some other sort of mechanism to ensure that business/domain rules don't end up accidentally being implemented in your UI code.

    Cheers
    Dan

  • If you have a hammer

    by Mark Struberg,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    ...everything seems to be a nail.

    That's exactly what happens when any new technology get's hypped. Be it XML, SOA, Rails, BigData, ... and now Microservices.

    The real point here is that any of those technologies added another useful hammer to our toolpark. Each of those technologies is good for solving certain problem and really bad at others. The important part is to understand the pros and cons of each and to know when to choose which for a certain aspect of the business need.

    IT Journals, marketing and certain Conferences seem to work against this but always declare the immediate death of the previously used approach. That seems like actively spreading 'fake news' (hehe). But the real explanation might be much more trivial: they must bring new stuff for their readers every time. So from their pov they of course follow - or even lead - the hype cycle without looking left nor right.
    This might be self-explaining for tec guys like us, but managers seem to blindly buy in any of those sales arguments without watching for the downsides.

    I'm using the modular-monolith approach since the early 2000s, even having separate maven parents for parent/fe, parent/be, parent/api which from the separate business modules get referenced in maven as <parent>../../parent/be/pom.xml</parent>. That way I can have common maven setup for all my api modules, backend (be) modules, frontend (fe) modules, etc. This reduces the pom.xml for e.g. a certain backend to usually just containing the few api dependencies.

    We also usually don't do a single monolith for the whole customer. Rather we do 'Maxi-Services' or a bunch of Self Contained Systems (SCS) which interact with each other over the network. Countrary to microservices such SCS contain the whole business logic + db handling + UI for a whole business aspect. Like all the customer data handling, a separate document archive, separate medical managenemt (think of an insurance corp or pension funds), etc. So the single parts are really fat and certainly not 'micro'. But each of them is modular on the inside exactly as you explained!
    Of course those SCS use the same technics to interact with each other as Microservices do: bulkheads, circuit breakers, etc. After all the fallacies of distributed computing does not depend on how big your distributed parts are...

    One more argument I'd like to stress is API-compatibility. Or rather missing it in Microservices.
    If you have a modular SCS/Monolith then you just need to compile the whole app and it immediately get's clear whether any API was broken or not. If someone changed an API and didn't fix all usages then your app simply would not compile anymore! Now try to detect such a use case with a fat Microservice application - good luck ;)
    It will randomly blow up at runtime without much chance to detect this upfront without an excessive manual test.

    Oh, just one more: dead code detection. How do you detect that a certain functionality isn't used at all? For classic apps it's easy. Sonar will tell you. And for more complex cases you can run code coverage. And if some parts are not covered then there is a good chance that one either didn't test properly or it's indeed not needed.

    While Microservices are only in production for a short period they still collected dust very quickly. And it's by far not as easy to detect dead functionality as with monoliths. You need to have extremly good logging and even then you cannot be sure. Maybe the code in question only gets used at the end of a quarter and called by some foreign parts?
    So in my experience it's much more painful to get rid of those obsolete parts. Most people I've seen just add new functionality without ever getting rid of old parts because they fear to break foreign microservices. Such an app quickly becomes unmaintainable and more important: unoperable.

    LieGrue,
    strub

    PS: great article, congratulation Dan!

  • Re: Data ownership

    by Erik Gollot,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    1000% agree, data is our treasure, it outlives the processes.
    Most of the time we talk about a "possible" independency of processes, services,...but we forget the data. And if you 've dependencies between the data...your services will have dependencies, whatever the technical solution. That life !

    Thanks again

  • Fighting big ball of mud

    by jerome angibaud,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I don't agree to the following statement :

    "Our first option is to use language features – such as packages (Java) or namespaces (.NET) – to group together the module’s functionality, but it isn’t otherwise distinguished from the rest of the application. There are however no guarantees that there won’t be cycles between those packages/namespaces; if you only use this option, you’re very likely to end up with a non-modular monolith, a big ball of mud."

    You can enforce zero-cycle using package only using technic as the one described here : djeang.wordpress.com/.

    I think that maven modules is overkill for modularity purpose. It makes development heavier (longer and more complex builds, more difficult to navigate in code, ...)

  • Reality check in 2019?

    by Peter Verkest,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I remember I liked this article back in 2017. Now, almost 3 years later, I analyzed Estatio again with Structure101 and what I found is a big ball of mud :-(
    Is it possible to explain what happened? Was it a deliberate choice to let go of the acyclic dependencies or was it something that just happened?

  • Remember also software is developed by people.

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's a fair point; the reason this happened is that we decided to prioritize team happiness over application structure.

    Estatio itself was developed by a stable team of 4 developers, though not everyone is full time dedicated on the app so it's probably 2 full-time equivalents, perhaps even less. Although we know that the codebase is ugly in places, the team knows that code intimately and the lack of structure in places hasn't impacted the team's ability to deliver new features, in large part because of the comprehensive test suite. We also know that we have tools (mixins, event bus etc) to decouple if necessary.

    At one stage I did break out the codebase into separate Maven modules in order to have Maven enforce the acyclic dependencies, but - long story short - that made the team less happy, so I ended up reverting that change and moved stuff back together.

    Looking forward, it's possible that the development of the codebase may end up being insourced by Eurocommercial Properties; at any rate we want to make that available as an option. So I do expect that some more structure will be reintroduced.

    Bottom line: yes, structure matters, but it must be understood in the context of the team that owns the codebase, and where the size of that team is one important metric. I probably didn't understand that when I wrote the article as well as I do now. At any rate, with a very small team (2 FTEs) you can get away with fewer architectural constraints than you could with a large team of 10 or more, say.

  • Re: Fighting big ball of mud

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Just replying to Peter Verkest's comment from 2 years after yours (see below).

    The point you make about maven modules being overkill was also the sentiment made by the team, and was the reason we reverted the change, ie put everything back into one Maven module. But as Peter has noticed, that's resulted in less structure.

    It's important for the team that owns the codebase to be happy, and when the team is very small (2 FTEs) then their individual preferences probably do trump any other architectural concern. But there comes a point when the team size will be such that some externally enforced architectural constraints are required. One could put Structure101 into the CI pipeline to enforce approriate acyclic dependencies, but the feedback loop is slow, not to mention license fees. So I still think I prefer Maven modules because the feedback loop is immediate - the code won't compile if the architectural constraints have been violated.

  • Re: Reality check in 2019?

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    it's a fair question; see my reply below: "Remember also software is developed by people."

  • Re: Remember also software is developed by people.

    by Dan Haywood,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    And an update several years on... the team has evolved, still very small, but the preference changed again and as of today Estatio is now a set of discrete maven modules - over 100 of them, in fact. And we now also use ArchUnit to help maintain various architectural constraints, as well as directed acyclic dependency graph enforced on us by Maven.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT