BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Using DRY: Between Code Duplication and High-Coupling

Using DRY: Between Code Duplication and High-Coupling

Leia em Português

This item in japanese

DRY reduces duplication and the maintenance problems coming with it, but misusing it leads to high coupling and reduced readability. The lesson: a software development principle should be applied considering other corresponding principles, patterns and practices.

DRY stands for Don’t Repeat Yourself, and it is a software development principle first mentioned by Andy Hunt and Dave Thomas in their book The Pragmatic Programmer: From Journeyman to Master. The principle states:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

In this context, Hunt emphasized the negative impact of duplication and consequently the importance of using DRY, on Portland Pattern Repository's Wiki:

Duplication (inadvertent or purposeful duplication) can lead to maintenance nightmares, poor factoring, and logical contradictions.

Duplication, and the strong possibility of eventual contradiction, can arise anywhere: in architecture, requirements, code, or documentation. The effects can range from mis-implemented code and developer confusion to complete system failure.

One could argue that most of the difficulty in Y2K remediation is due to the lack of a single date abstraction within any given system; the knowledge of dates and date-handling is widely spread.

Although DRY seems like a must-apply principle of software engineering, Anders Munch noted that there are exceptions:

There is a pattern to the exceptions to the principle. It is ok to have more than one representation of a piece of knowledge provided an effective mechanism for ensuring consistency between them is engaged.

  • Definitions and declarations of C functions: these are usually in sync because the compiler flags inconsistencies, forcing the programmer to take action.
  • Unit tests: inconsistency means the tests will fail, again forcing someone to take action.
  • Auto-generated stuff: periodic regeneration ensures consistency.

These exceptions actually enforce the rationale behind DRY. But a question arises: aren’t programmers taking DRY to extremes? Isn’t DRY misunderstood and misused sometimes?

Dave Thomas remarked early on that “Most people take DRY to mean you shouldn't duplicate code. That's not its intention. The idea behind DRY is far grander than that,” expanding the DRY principle to an entire software system:

DRY says that every piece of system knowledge should have one authoritative, unambiguous representation. Every piece of knowledge in the development of something should have a single representation. A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation.

Given all this knowledge, why should you find one way to represent each feature? The obvious answer is, if you have more than one way to express the same thing, at some point the two or three different representations will most likely fall out of step with each other. Even if they don't, you're guaranteeing yourself the headache of maintaining them in parallel whenever a change occurs. And change will occur. DRY is important if you want flexible and maintainable software.

The problem is: how do you represent all these different pieces of knowledge only once? If it's just code, then you can obviously organize your code so you don't repeat things, with the help of methods and subroutines. But how do you handle things like database schemas? This is where you get into other techniques in the book, like using code generation tools, automatic build systems, and scripting languages. These let you have single, authoritative representations that then generate non-authoritative work products, like code or DDLs (data description languages).

While the issues of using DRY seemed to be settled long time ago, the principle has been mentioned again at QCon London 2012 by several speakers, including Greg Young and Dan North, who drew attention to its possible misuse. InfoQ followed the trail to find out more details on DRY. When InfoQ asked what problems he sees with DRY, Young said:

The basic argument against following DRY is that there is another side of things. When following "DRY", it is quite common that people start building coupling and complexity into their software. One side of the trade off is very easy to measure (number of face plants per hour when needing to fix bugs multiple places) while the other is rather difficult (coupling and complexity built into software in the name of DRY).

One can argue that if DRY is followed "properly" that there will never be coupling and complexity built into the software. This is even anecdotally visible, I can write you a code base that perfectly follows DRY while not introducing coupling and complexity. This however assumes that I have perfect knowledge.

We also talked to David Chelimsky, author and lead developer of RSpec. He said he has seen DRY been taken “to the line level, and it's not always appropriate there (though it can be sometimes).” He provided the following example:

describe "Person#full_name" do
it "concats the first and last names" do
   first_name = "John"
   last_name = "Doe"
   person = Person.new(:first_name => first_name, :last_name => last_name)
   person.full_name.should eq "#{first_name} #{last_name}"
end
end

Although this piece of code avoids duplication and might seem as a good implementation of DRY, Chelimsky said he prefers the following more readable piece of code:

describe "Person#full_name" do
it "concats the first and last names" do
   person = Person.new(:first_name => "John", :last_name => "Doe")
   person.full_name.should eq "John Doe"
end
end

adding:

To a person who doesn't really understand DRY, but thinks it's the bees knees, seeing "John" and "Doe" twice each in that example is like nails on chalkboard. For me, it's quite the opposite. I find it easier to see the relationship between the first and last names, and the outcome of full_name.

Chelimsky also pointed to another piece of code from the Objectify framework he recently stumbled upon. The following code marked with italics

def request_resolver

  klass = Objectify::NamedValueResolverLocator

  @request_resolver ||= klass.new.tap do |resolver|

    resolver.add(:controller, self)

    resolver.add(:params, params)

    resolver.add(:session, session)

    resolver.add(:cookies, cookies)

    resolver.add(:request, request)

    resolver.add(:response, response)

    resolver.add(:flash, flash)

    resolver.add(:renderer, Renderer.new(self))

  end

end

was replaced with this:

{:controller => self, :params => params, :session => session, :cookies => cookies, :request => request,
:response => response, :flash => flash, :renderer => Renderer.new(self) }.each do |key, value| resolver.add(key, value)
end

Chelimsky commented on this change: “This is taking DRY too far IMO. This part is more maintainable (easier to read and modify) before this change.”

One of the main problems seen by Chelimsky is that “The words "Don't Repeat Yourself" are intended to be a memory device, but "DRY" has become a memory device for "Don't Repeat Yourself" and that ends up being _the principle_ in many minds,” noting that this approach may lead to another side of the problem: “when we reduce duplication, we increase coupling”:

When two methods on the same object do some of the same work, we typically extract a third method to which they both delegate.  Both of the original methods are coupled to the extracted method and, indirectly, to each other.  This seems perfectly logical and harmless in the context of a single object, but how about when we recognize similar behavior across two objects? To reduce that duplication we need to either introduce a new object they both depend on or, far worse and sadly too often, we have one object depend on the other. The latter approach often leads to dependencies between objects that are unrelated, reducing their ability to evolve over time.  Introducing a new object increases the overall surface of the system, and requires thought and care that it doesn't always receive when introduced when refactoring.

In order to avoid taking DRY to extremes, Chelimsky proposes balancing it with other development principles:

DRY is important, but so are Uncle Bob's SOLID principles, for example, or broader concepts like low coupling and high cohesion. It's not good enough to simply apply one principle all the time - you have to take them all into account and weigh their relative value in each situation. It's sort of like knowing which seasoning to put on fish, and which to put on a steak. Some do really well in both cases, some not so much.

DRY is an important principle, but abusing it can generate problems such as increased coupling and reduced readability. The lesson here is that no matter how great a principle is, it should not be used disregarding other good programming practices.

Rate this Article

Adoption
Style

BT