Frans Bouma Argues Code First O/R Mapping is “Silly”

| by Jonathan Allen Follow 636 Followers on Dec 18, 2013. Estimated reading time: 3 minutes |

When building a database-backed project using an ORM, developers can choose between starting with database tables, classes, or abstract models. To open the debate, we offer you Frans Bouma’s argument that, “Code-first O/R mapping is actually rather silly”.

Starting with code in the form of entity classes is equally odd as starting with a table: they both require reverse engineering to the abstract entity definition to create the element 'on the other side': reverse engineer the class to the abstract entity definition to create a table and the mappings is equal to reverse engineering a table to a class and create the mappings. What's the core issue is that if you start with a class or a table, you start with the end result of a projection of an abstract entity definition: the class didn't fall out of the sky, it was created after one determined the domain had such a type: e.g. 'Customer', with a given set of fields: Id, CompanyName, an address etc.

He continues,

I know the whole idea of 'code first' comes from the fact developers want to write code and think in code and want to persist objects to the database, but that's not what happens: you don't persist objects to a database, you persist their contents, which are the entity instances. It might very well be that an entity class instance (so an object) contains more data than the entity instance, so storing 'the object' then doesn't cover it. Serializing an object is a good metaphor here: with serialization, the object isn't serialized, but a subset of its data, to a form which might not match its source. When deserializing the data into e.g. javascript objects are we then still talking about the original .NET object? No of course not, it's about the data inside the object which lives on elsewhere.

Isn't it then rather odd that when serializing 'objects' to JSON, the overall consensus is that the data is serialized, but when the same object is serialized to a table row, it's actually persisted as a whole? If you are still convinced O/R mapping is about persisting objects, what happens with 'your object', persisted to a table row, if that object is read by a different application, which targets the same database, and which doesn't use an O/R mapper? That application, written in an entirely different language even, can perfectly fine read and consume the entity instance stored in the table row, without even knowing you considered it a persisted .NET object. Because, surprise, the contents of the table row isn't a persisted object, it's a persisted entity instance, an instance of an abstract entity definition, not an instance of a class definition.

Reddit user remy_porter has a different take on the issue,

I think the problem is that Model First in EF is terrible. I hate the GUI tool that it forces you into.

My favorite way is to use CodeFirst to do a meet-in-the-middle approach- I write my object model in the way that makes the most sense, I write my database model in the way that makes the most sense, and then I use the FluentAPI to make them match.

Although, I do admit, I mostly use it to blindly throw object graphs at the database because I told management we should be taking a NoSQL approach to this application, but they ignored me (the data model is about storing documents with variable structures, but they demanded SQL Server).

Nishruu prefers it for testing,

Yeah, I usually use the Fluent API to map the existing database, which I design first.

The only place when code-first is actually useful is for quick integration/unit tests with either in memory SQLite DB or some kind of LocalDB - then you can roughly and quickly re-create the database structure for testing.

It's especially nice in NHibernate, where it works really well with in-memory SQLite DB. With EF - not so much, I'm using LocalDB and just re-creating the schema from MDF file...

InfoQ invites you to read Frans Bouma’s full argument and then tell us your opinion.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Which domain first by Duraid Duraid

The whole idea of code first is to start with the problem domain and not the solution domain.

Re: Which domain first by Jonathan Allen

Could you elaborate on that thought?

ORM's were over-hyped by peter lin

Having worked on ORM's and used a variety of them, the truth is the hype around ORM's set an unrealistic expectation. In reality, you can't design your object model without thinking about how it affects the database tables. You also can't design your object model without taking into consideration data access patterns. Lazy loading is nice for trivial toy scenarios, but once you get into real world applications that need views of several objects, "code first" creates problems.

In some cases creating materialized views and mapping them to "view objects" is the right choice. In other cases using a DTO (data transfer object) to provide a view is acceptable. There is no single solution and developers still have to understand what they're doing. Lots of people mis-use ORM's and don't realize the concrete impact it has on performance.

Take for example an object model with deep inheritance structure. I've seen this in multiple places. Basically, there's a DomainObject at the top that provides common attributes like timestamp, and version. Then all the classes extend the base class. If the model has 2 thousand classes and you do a select on DomainObject, you end up asking for the whole database. Polymorphic queries are useful, but more often than not they get abused.

The main benefit of ORM regardless of the product is it reduces boilerplate code. In some cases that is a big win, but in performance sensitive use cases it becomes a huge bottleneck. The question isn't code first or not. The question is "do you take time to understand how to properly use an ORM?"

Re: Which domain first by Duraid Duraid

Software is a solution to a problem in a certain domain. The domain model should model the problem. Everything else other than the domain model like the database, the data access layer, service layer, UI represents the implementation of the solution. And it is a specific implementation that can change as new tools and libraries are available while the domain model can remain unchanged given that the domain problem hasn't changed.

Given this view, starting with the problem domain (domain model) and not the solution domain (database) is an obvious choice.

This is the idea and in reality you have tools that can be inadequate and specific libraries (EF, NH) that are problematic, performance problems... etc.

Re: Which domain first by Frans Bouma

That doesn't say you should start with code, which is a projection result of a model (likely in your head). I argued in my article that the model you should start with is the model that's actually also the source of the model used to write the classes. However instead of keeping that in the memory of the developers, one should actively model that abstract model and use that model to create the classes and tables as those are actually derivatives (projections) from that abstract model.

Example: Customer, Order, OrderLine, Product. I can define these abstract entities with their fields, their identifying fields, their relationships and not write a single line of code nor table definition. Then I can use that model to create the classes using rules defined for that. Examples of these rules are the ones defined by Halpin and Nijssen in their NIAM rules for translating an abstract entity model in NIAM to table definitions.

I can also define rules like that (e.g. use the same or slightly altered ones) to translate the model to classes. The advantage is that I now have a proper, verifiable model which is the theoretical base for both sides. One of the main advantages of this is that I can make changes in one place, which are again verifiable, and let these changes ripple through to both sides, following the rules defined for these projections.

That's not where it stops though. I can create projections of that model to other models (which thus are actually defined by rules again, so I can get the changes made to the core model applied to my sub models without effort doing myself) and create code from these models as well. As everything is related to each other and originates from the core abstract entity model, I have a single place where I have to model the domain, in such a way that it isn't polluted with code constructs, language limitations or other code related aspects, it's a pure model of the domain.

Starting from code doesn't have that. To be able to do anything with the code, the orm has to reverse engineer the code to that abstract entity model first, however there's a difference: you can't reach that source of which the code model actually originates from, so you're doomed to make changes in the result of a projection, not the source of it. Like you change your C# code by altering the compiled form through altering IL, instead of changing the C#.

Re: Which domain first by Duraid Duraid

I see that we both agree about model driven and the argument is what to use to create that model, code or entity model.

If that's true, I argue that code is until now is the best tool available for domain modelling. We've heard it all before, (write your model with UML and then generate your code) and replace UML with anything you want including and entity model. All that has not proved to be successful.

Domain model cannot be generated, it has to be hand crafted by code to remain an accurate representation of the domain problem.

To me personally, I have a litmus test; if the domain model does not lend itself to unit testing then it's not good. Because with unit testing you ensure that domain rules are verified and things like dependencies that cause maintenance problems down the road are taken care of.

Also languages, with their limitations, offer way more power than any other tool (including entity model) which can used to express a sophisticated domain model.

Re: Which domain first by peter lin

Having used UML, worked at IBM ascential, built a few ORM and used several different Java ORM's, it still comes down to the developer. Whether one does model first in a schema modeler or code it by hand, the reality is the developer still has to understand the tool and know how to use it. When I was part of the metadata server team at IBM ascential, we took the UML approach and code generated the classes and sql.

It's not a perfect solution and often leads to a higher level model that actually doesn't translate well to code. Take for example bi-directional references and uml stereotypes. Just because you've modeled a relationship between two classes as bi-directional, it doesn't the generated code is going to be simple. What it ends up doing is making your ORM more complicated and the code more sensitive to bad queries.

The "art" of domain modeling is hard. Often people who promote Model first haven't seen real world schemas with 5K+ classes and have to build an app with it. From my experience, people who say model first have only seen schemas in the range of hundreds of classes. That's not to say model first isn't useful. At the end of the day, the ORM largely doesn't matter. It's the developer using the tool.

Re: Which domain first by Duraid Duraid

But when it comes to testability, modelling with code wins hands down.

When you think about it, testability is big factor in this discussion. In order to write testable code developers ditched EJBs for POJOs (POCOs for C#) and are the same developers asking for code-first ORM mapping in order to carefully craft their domain model and then hook it up to the database via ORM.

Re: Which domain first by peter lin

that's not always true. I'll give you an example. At Ascential, we had change impact analysis for the model. When changes are made to the model, the developer/architect can annotate the change and the framework would calculate the "impact" of the change. From that, metadata server could perform automated data migration and produce the necessary unit tests to insure the data migrates successfully.

Once you dig deep into the theory and practice of ORM's, the problem becomes much more subtle and complex. I've learned these lessons the hard way as an user of ORM and developer of ORM.

Re: Which domain first by Duraid Duraid

I'm not sure I understand the point that you're trying to make. What is not always true? Are you saying that testable domain model can be obtained from generated code and not code-first approach?

If so then I'm not convinced and maybe practices like those exist but the success of them is not proven for sure.

The dominating practice remains to use code-first approach in order to produce testable domain models.

P.S. who looks at IBM for best practices anyway? seriously.

Re: Which domain first by peter lin

for example, say I code the model in Java first and I do a good job of keeping it just POJO's. Over time, the model changes. How do you manage that change if there's no higher level definition of the model and track the changes? I've seen people try to do it manually, but it is often messy. That's not to say it can't be done if the developer is deligent. In practice though, the developer that designed the model might have left and the person that took over might not be as deligent.

My point though is that no single approach model first or code first is full proof. It still comes down to the developers. The tool is only as good as the person using it.

Re: Which domain first by Duraid Duraid

Sorry but that's not a valid argument.

Of course it depends on many other factors, but given all factors are the same code first is the better approach.

Re: Which domain first by Roger Johansson

I'm 100% with Frans here.

The code is not the model.

For example, in the real world, you don't talk about int32, bool, decimal, float etc.
Those are mere compromises done when implementing the code.
Properties of those type do not reflect the real world.
The same goes for strings:
I have yet to find a domain expert in any domain that says that "a customer has a name, which is a sequence of characters with maxlength 20 and character encoding UTF8"

Maybe a domainexpert in active directory would say something in that manner, but never in a LoB domain.

That is you as a developer that make that compromise when you design your code..
The model is an abstract representation of the domain at hand.
ints, floats and strings does not live there.

This does not however mean that code first is not a pragmatic approach that is "good enough" in many cases.
But don't mistake the code for the real model ;)

Re: Which domain first by Frans Bouma

Why do you need code to test your model? I can perfectly verify and validate a model, just define rules and check whether the model obeys them. (I'm not mentioning tools here nor in my article because of intended bias towards my own work).

Re: Which domain first by Frans Bouma

I wrote the article and I have seen large models (2000+ entities). I agree that with large models one has to find a way to represent the model properly and efficiently otherwise the whole thing comes down hard, as it's impossible to understand a single visual diagram with 1000s of entities, like it's impossible to understand a code base of 1000s of entity classes when you look at them all at once in one space.

Being able to work with large models has influenced my work in this field and has been one of the cornerstones of what I built. Most tools I've seen made by others however indeed focus on one diagram first, which is not the way to go if you want to capture a large domain.

Re: Which domain first by Frans Bouma

> P.S. who looks at IBM for best practices anyway? seriously.

Do you want to be taken seriously or are you just ranting a bit for the fun of it?

Re: Which domain first by Duraid Duraid

I think we have reached a conclusion here. To me it's clear that the people who are demanding code-first are the ones who believe in unit testing and want to write domain models with high test coverage.

If want to argue against the merits of unit testing and hight test coverage of domain models because they're definitely not achievable with the method you're advocating, then that's another discussion.

Re: Which domain first by peter lin

@Duraid - if you code the model first, how do you compare different versions of the model? Assume we're using Java, reflection can only do so much. How do you take into account bi-directional references or circular references in an object graph? I've tried to use reflection to do that in the past and gets you maybe 80% at best. Having a higher level model be it UML, XML schema, Visio or RelaxNG, gives you the ability to annotate the details of the model and changes between versions. My question to you is this. How would figure out the delta between 3 different versions of a model with just POJO's?

Having said that, not every model has thousands of classes with messy relationships. If you want to know what kind of model I'm thinking of, go look at Accord standard for property and casualty insurance. Modeling something like auto policy is extremely messy and a code first approach for that generally leads to a morass of ugliness. This isn't my opinion, it's first hand experience. Don't take my word for it, go ask anyone that has implemented an auto policy application and they will tell you the same thing.

For a small simple webapp with less than 50 classes, I would probably use code first. That's my bias opinion and not everyone is going to agree. For me, a simple app doesn't need a model first approach.

Re: Which domain first by peter lin

I would recommend people take time to look at what Ascential metadata server did and how it tackles the issue of modeling and change impact analysis. IBM does a lot of things poorly and generally I don't like IBM software. I especially don't like WebSphere.

They've changed the name of the product several times, so I don't know what it is called any more. I'll give you one example where Metadata server is still much more advanced that existing ORMs.

Say we have a deep model where all classes extend a base DomainObject. If you want to execute a polymorphic query at the second or third level, the number of queries could explode. In the early versions of the product, we saw that first hand. One could argue, why model things that way? Ignoring "why model it that way" for now, how do you make polymorphic queries efficient for this use case? Also ignore caching for now.

What metadata server did was to analyze the model and produce materialized views. What would have been 200 individual queries using "table per class" got converted to a fraction of the number of tables. If an user queried for 2 classes, it went against those specific tables. When the user ran a polymorphic query, it would use the views instead. The real world performance gain was over 10x on DB2, Oracle and SqlServer. The model compiler was smart enough that if your inheritance fit "table per hierarchy" it would use 1 table.

Obviously, going code first a developer can do the same exact thing. I would argue your typical developer hasn't spent 5 studying modeling, or ORMs. I've seen senior developers make these mistakes because they thought "ORM's are easy". Like all things, once you dig deep into the topic, it is a lot more subtle and complex.

Re: Which domain first by peter lin

@Duraid - I don't buy that argument. At another job (not IBM) we built a modeling tool that generated the SQL, C# classes, unit tests and WCF endpoints. We also manually wrote unit tests to make sure we could regression test the data access component.

To me code first or model first isn't some kind of religion. My advice to people is always "be practical". Both models work and have "sweet spots". Ultimately the biggest factor of code first or model first is the development team. It doesn't make sense to force developers that prefer code first to do model first. That's like asking a PERL guy who hates OO to use C#, it's just asking for trouble. You prefer code first and that's totally valid. Other people prefer model first and that's valid too. In both cases though, it's important to keep an open mind and learn from approaches.

Re: Which domain first by Duraid Duraid

This is not an argument, it's a fact; generated code is not unit testable. I mean effectively unit tested. For sure you can generate a lot of unit tests testing marginal cases like null arguments, etc. But core business behaviour can only be tested with written unit tests.

I'm happy that you can generate most of your application. I have not seen or heard this working in practice, so i cannot comment.

Re: Which domain first by Duraid Duraid

In code-first ORM mapping versioning is handled with a version control system just like the rest of the code.

Re: Which domain first by peter lin

@Duraid - you didn't answer the question of "how do you calculate the delta" so that you can effectively apply those changes to your database and then run unit tests to make sure the conversion was successful. I'm guessing you do it by hand. I've used both methods of "figuring out what changed and what changes to the data are required". For small models and small databases, it's not too bad. For large models and databases, manually doing it is much more painful. By large, I mean databases above 500GB with tens of millions of rows of data.

Re: Which domain first by peter lin

Just because you haven't seen model first work in practice first hand, that doesn't make it a "fact". I've seen code first fail miserably first hand. I don't understand the statement "generated code is not unit testable." What does "effectively united tested" mean. Do you mean full coverage for positive and all possible negative tests? In the interest of education and exploration.

The .Net tool we built generated all CRUD tests. Our tool also had a rule engine, so we could also generate unit tests that used the business rules. That allowed us to test queries an use case would perform. For example, creating a policy, adding vehicle to a policy, adding driver to a policy and adding coverage to a policy. We could also run business rules that created a claim, closed a claim and updated a claim. All of the business rules performed dozens of queries with a mix of inserts, updates and selects.

I understand it's easy to dismiss model first because you've never seen it work first hand. But to claim that is "fact" is a bit aggressive. I've seen both approaches fail as I've stated before. Developer is ultimately the person who makes it a success.

Re: Which domain first by Duraid Duraid

Why do you need to calculate the delta?

I'm afraid you're taking this discussion into tangents that distracts from the core issue which is code-first vs model generated. My point is that code-first is better for unit testing which still has not been contradicted.

Re: Which domain first by Duraid Duraid

Agreed. I'm not disputing that model first does not work. All I'm saying is that code first is better for unit testing.

Maybe you can generate test for CRUDs but how do you test core business behaviour like for example a Customer is not allowed a certain action based on certain conditions?

In my opinion generated unit tests do not have much value and all they tell you is that your code generator is working fine. I'm not saying they're totally useless but they're only an extra measure of assurance but not a substitute for real unit tests.

Re: Which domain first by peter lin

@Duraid - let me try to explain it again, hopefully this time I'll do a better job. In my mind, part of unit testing is "testing the systems when the model changes." To do that, you need to know how the model changed, which tables need to be changed and how do I change the data. To me it isn't a tangent at all. Unit testing for me isn't just testing queries (insert, update, delete). I realize some people divide testing into unit tests, integration tests, stress test, QA tests and UAT tests.

For me, the entire testing process "should" be as fluid as "practical". To me that means class level unit tests, integration tests and stress tests. For example, say I make a change to the model by extracting phone number out to a separate table. When I go to unit test the updated class, DAO, and ORM mapping, I can either create a clean database and populate it with fake data or I can take a snapshot of an existing database with real data.

The first case is easy, I just write some new unit tests for the new model and run them. The downside I see is this. Real data often looks nothing like fake data and often you won't find bugs with fake data. To get better coverage, I prefer to use a snapshot of an existing database.

To do the second case, you'd need to automate the process of converting the snapshot to the new schema before you run the test suite. If the modeling tool provide annotations for "change impact analysis", it can also update any existing CRUD unit tests and business rules. In the case where you're bulding an ORM, you want to test multiple models to make sure lazy loading works, projections and polymorphic queries work as expected.

In the case where there's a large suite of unit tests, do you want to update them by hand? I've done that in the past when there were model changes. Manually changing 2K+ unit tests sucks. With code first, the only option you have is to review all the unit tests and make the necessary changes. The larger the test suite, the more time it takes. Take a step back and look at the general problem holistically. I know I sound like a broken record, but it isn't code first or model first. The real question is "what's right for my project?" Forcing code first or model first starts to look a lot like a religion.

Re: Which domain first by peter lin

I see the disconnect here. You're definition of "unit test" is different. As I said in previous response, one of the systems I worked on, we had business rules as part of the system. Therefore we could generate unit tests that performed business operations like "create a new claim, update a claim, close a claim". Just because some ORM's suck and don't provide the ability to test business use cases, that doesn't mean "code first" is the best solution. It just means product "X" blows. I could point out where most ORM's suck, but I'm not going to bother. Even though I've spent more than 10 years gaining deep experience with ORM's, I'll be the first to admit there's still a lot more subtlety that I don't understand. I've done some crazy stuff with ORM's that's absolutely not recommended, but that's a completely different topic.

Re: Which domain first by Duraid Duraid

But you still have not answered the question: which approach is better for unit testing, code-first or generated code?

Re: Which domain first by peter lin

my bias answer is neither. The system we built with .Net provided a lot of value and gave us the ability to generate unit tests that used business rules. If I had a small personal website with 10 classes, it would be "too heavy" to go model first. To me the question isn't "code-first" or "generated-code". To me that's a "false dichotomy". The question isn't "code-first" or "model-first" either.

The question is "what is the right tool for the development team?" Give the wrong tool to the wrong person and it usually produces garbage. Give the "right" to the right person and they can get the job done. I've seen far too many projects fail because some executive said "we must use X" and the dev team fails miserably.

Given people's definition of "unit testing" varies, we always have to quantify and qualify what "unit test" means in each environment. At IBM, QA and UAT tends to be a "throw it over the wall" mentality. At other places, the dev and qa team are tightly knit and work smoothly together. Yes, you can write unit tests by hand with code first. Yes, you can generate crud tests based on the model. "best" or "better" is totally subjective.

Some people prefer apply modeling theory and formal logic to validate the schema, generate the sql, classes for multiple languages and unit tests. Is that "better" for everyone? I'd say no. Just because I've spent 10 years learning this stuf, it doesn't mean it's better.

Re: Which domain first by Duraid Duraid

This is non pertinent. Given that you have the right developers for both cases, code-first is the better approach for unit testing. Do you agree?

Re: Which domain first by Frans Bouma

WHere do I say I'm against unit testing? I just said one can validate models in a better form than code. In fact, one can generate tests from the model as well as domain classes, including validation tests for field lengths and the like. No extra work needed.

Re: Which domain first by Frans Bouma

> But core business behaviour can only be tested with written unit tests

and as core business behavior is code written by hand which consumes generated domain classes, where is the difference? The only thing different is that you have to test your hand-written domain classes and mappings and a model-first developer doesn't have to as a machine does that for her/him. If you say you can test better than a program which uses strict rules, I am happy for you, but I doubt it would be true.

Re: Which domain first by peter lin

Given the "right" developer for each method, I would determine "which is better" based on the project's needs. There's no such thing as "better for all cases." I've worked on large healthcare projects with 2K+ classes in the model. The architects modeled the schema in a fancy but shitty modeling tool, I won't say which one, but I'm sure people can guess.

The developers were given PDF's snapshots of the classes for their use case and hand coded them in Java. Each team then wrote their hibernate mapping, DAO's, unit tests, integration tests and UI.

If the modeling tool used by the architects could produce POJO, it would have saved us a ton of time. The constraints of the project and environment play a big factor. As long as the factors fit the method, then I would use it. If not, it ends up being a huge failure. Far too many developers force their favorite "tool/approach" and ignore the importance of the environment and constraints.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

34 Discuss