Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles ORM with DrySQL and ActiveRecord

ORM with DrySQL and ActiveRecord

Most object-relational mapping frameworks introduce redundancy into your software (no, not the good kind). Columns that are named and typed on your database are renamed and re-typed as instance variables in your application code. If your ORM framework offers facilities for modeling referential constraints, these are again defined both on your DB and in your application code. Many ORM frameworks also have a mapping layer that specifies the mappings between a class and a database table, which means that many of your database artifacts end up being specified in 3 separate locations: on a database, in a model class, and in a mapping layer or configuration file.

As suggested by proponents of the DRY principle (Don't Repeat Yourself), when elements that are logically related don't change together uniformly, keeping them in sync can be problematic.

"It's easy to duplicate knowledge in the specifications, processes, and programs that we develop, and when we do so, we invite a maintenance nightmare" - Hunt & Thomas, The Pragmatic Programmer

You could argue that a database schema doesn't change very often, but the fact is that it can and does change. A former professor of mine used the drill the idea into my head that "change creates opportunity", and until someone manages to disprove his theory I'll continue to believe that removing barriers to change is a good thing.

Enter ActiveRecord

ActiveRecord, the object-relational mapping layer for Rails, takes great strides towards reducing this sort of redundancy. Rather than mapping columns to instance variables in application code or a mapping file, ActiveRecord generates them dynamically at runtime based on your database's information schema. As well, class-to-table mappings, and special columns like keys and counters are identified using naming conventions.

Although ActiveRecord has set the bar pretty high for ORM frameworks, the opportunity still exists to extend it, to further reduce redundancy, and to make it even more dynamic. Thanks to the flexibility of the Ruby language, and the effort that people have put into projects such as RubyForge and RubyGems, extensions to existing Ruby features are quite simple to produce and to consume.

So What's Missing?

ActiveRecord falls a bit short of complying with the DRY principle. If you have tables or columns that aren't compatible with the naming conventions, then you need to redefine some of your database artifacts in application code. As well, ActiveRecord associations and validations are redundant specifications of database constraints. I want to be able to change my database schema without needing to re-wire the ORM layer(s) of my dependent applications. I want to avoid (as much as possible) the situation where I'm averse to making a change to my database schema because of the effort that would be required to sync up dependent applications.

ActiveRecord uses the database's information schema to dynamically retrieve column metadata. Can this strategy be applied to all database artifacts?

Introducing DrySQL

DrySQL is an extension to ActiveRecord that aims to completely free it from redundancy, following the DRY principle. In short it aims to model your entire database schema dynamically, querying its information schema and eliminating the need for object-relational mapping in your application code.

So how does it work?

Before examining what DrySQL does, it's important to consider when it does it. In order to model your database's schema, DrySQL needs to query your database, which requires a database connection and results in temporary performance degradation. The current strategy for maximizing performance and convenience is to retrieve metadata for a particular table the first time an instance of the class that maps to that table is instantiated. For example, DrySQL will retrieve metadata for your Employees table the first time an Employee object is instantiated. This information is then cached on your Employee class in memory and is available for all subsequent instances of Employee. The result is that the performance hit is a one-time occurrence (per model class), a database connection is not required until you instantiate a model object, and metadata is only retrieved for tables that your application actually references. If your application never instantiates an Employee object, no metadata is retrieved for your Employees table. This strategy also means that programmers don't need to change the way they use ActiveRecord in order to get the benefits of DrySQL. As long as DrySQL is "required" in an appropriate location, programmers don't need to "do" anything in order to use its features.

Mapping classes to tables

If your table name is not compatible with ActiveRecord naming conventions, then you need to explicitly map it using set_table_name inside your model class definition. The only way around this is to enhance the naming convention logic, which would still leave your application dependent on assumptions.

If your table name is compatible with the naming conventions, then you don't even need to define a model class to map it to.

If my reference to Employee generates a NameError (i.e. the Employee class is not defined), DrySQL will catch this and search for a matching table using ActiveRecord naming conventions. If a matching table is found, DrySQL will automatically generate a model class for it. If the standard features of ActiveRecord::Base are all that you need for a particular model class and your table is compatible with the ActiveRecord naming conventions, then you don't need to define a model class for your table at all.

Identifying Keys

ActiveRecord uses primary and foreign keys for a variety of tasks, and both are defined as constraints on your database. DrySQL retrieves this information from your database's information schema, so regardless of the conventions you use to name your key columns in your database, you don't need to re-specify any of them in your application code. Here are a couple common examples of key usage.

The find method will query your Employees table for the record whose primary key is 18. Rather than rely on assumptions about what your primary key column is named, or a redundant specification in your Employee class (eg. set_primary_key 'X'), DrySQL retrieves the primary key specification for your Employees table from your database's information schema. You can name your primary key whatever you want to on your database, and you don't need to specify it in your application code. The same is true of foreign keys, like the one specified in the belongs_to association above. In fact, DrySQL generates your associations automatically, foreign keys and all.

Generating Associations & Validations

DrySQL retrieves your table's constraint definitions, and uses them to automatically generate associations for your model class. In most cases, generating associations based on referential constraints is very straightforward, assuming that your database schema is defined using accepted best practices. Through associations are an exception. If A has_many B through=> [X, Y, Z], then it becomes problematic to determine which association should be generated. Each association from A to B will overwrite the previous one, so the association generated from A to B will depend on the order in which the constraints are processed. Though still imperfect, DrySQL mitigates this problem (and others) by refusing to overwrite an existing association with a through association. If a programmer wishes to address the A has_many B through=> [X, Y, Z] issue, they can specify the desired through association in their model class definition, and DrySQL will honour this. DrySQL also gives direct associations (i.e. non-through associations) preference over through associations, so a through association will never overwrite a direct association such as belongs_to or has_many.

Many ActiveRecord validations directly correspond with database constraints, and DrySQL generates such validations automatically. DrySQL endeavors to exactly mirror the behavior of database constraints, and introduces validates_nullability_of as an example of this. This new validation will enforce NOT NULL column constraints by rejecting null values, but only in situations where your database would reject a null value. For example, if your column is auto-generated, or has a default value specified for it, validates_nullability_of will not reject a null value because your database wouldn't reject a null in these situations either.

In Summary

ActiveRecord is innovative in its use of a database's information schema to dynamically generate table and column mappings. DrySQL applies this strategy to the other artifacts in your database, the ultimate goal being that all database artifacts can be defined in one place only: the database itself. For more information about DrySQL, please visit the project homepage on RubyForge.

About the Author

Bryan Evans is a software developer for a large financial organization based in Toronto, ON and his expertise is primarily in software infrastructure. Bryan is the author of DrySQL, and is a long time Smalltalker who was attracted to Ruby by ActiveRecord.

Rate this Article