BT

Cut off wrong dependencies in your .NET code

Posted by Patrick Smacchia on Jul 16, 2012 |

The best advice I could give to a team of .NET developers to keep their code maintainable in the long term is: Treat each namespace in your application as a component, and make sure there are no dependency cycles between your components. By abiding by this simple tenet, the structure of a large application can’t diverge to the monolithic block of spaghetti code base that seems to be the rule more than the exception in enterprise professional development.

Namespaces as Components

Since the inception of .NET a decade ago, the Visual Studio tooling implicitly defined a component through a VS project (hence an assembly). This has been, and still is, a major problem because a component is a logical artifact to structure code, while an assembly is a physical artifact to package code. Again it is the rule more than the exception to see enterprise applications made of hundreds of VS projects.

This is why I encourage the use of the lightweight notion of namespace to define component boundaries. Benefits include:

  • Lighter organization: having more namespaces and fewer assemblies leads to fewer VS solutions and VS projects.
  • Optimized compilation time: each VS project introduces a performance overhead at compilation time. Concretely, this can lead to a compilation process that takes minutes, but could take seconds instead, if the number of VS projects was drastically reduced.
  • Lighter deployment: better to deploy a dozen assemblies than a thousand.
  • Better startup time for our applications: each assembly introduces a small performance overhead when the CLR loads it. Dozens or hundreds of assemblies loaded introduce a noticeable performance overhead, measured in seconds.
  • Facilities for hierarchical components: namespaces can be hierarchized, assemblies cannot
  • Facilities for more finely-grained components: having 1000 namespaces is not a problem, having 1000 assemblies is. The choice of having some very small components shouldn’t be impaired by the burden of creating a dedicated VS project.

Dependency Cycles Harmful

Dependency cycles between components lead to what is commonly called spaghetti code or tangled code. If component A depends on B depends on C depends on A, component A can’t be developed and tested independently of B and C. A, B and C form an indivisible unit, a kind of super-component. This super-component has a higher cost than the sum of the cost over A, B and C because of the diseconomy of scale phenomenon (well documented in Software Estimation: Demystifying the Black Art by Steve McConnell). Basically, this holds the cost of developing an indivisible piece of code increases exponentially. This suggests developing and maintaining 1000 LOC (Lines Of Code) will likely cost three or four times more than developing and maintaining 500 LOC, unless it can be split in two independent lumps of 500 LOC each. Hence the comparison with spaghetti; tangled code can’t be maintained. In order to rationalize architecture, one must ensure there are no dependency cycles between components, but also check that the size of each component is acceptable (500 to 1000 LOC).

Fighting against design erosion

The last version 4 of NDepend released in May introduces new capabilities to fight against dependency cycles and I’d like to discuss the practical aspect a bit.

Now that we can write code rules based on LINQ queries (what we call CQLinq), we can use the tremendous LINQ flexibility to develop smart rules. One of them I co-authored, is a code rule that reports namespace dependency cycles. For example, if we analyze the code of the .NET Fxramework v4.5, we can see below the assembly System.Core.dll, comes with two namespace dependency cycles. Both these cycles are made of 7 namespaces. The code rule indexes each cycle found with one of the involved namespaces (chosen randomly) and exhibits the cycle. Left click the cycle to see the list of namespaces involved:

(Click on the image to enlarge it)

By right clicking the list of namespaces or the cycle itself, a menu proposes to export them to the dependency graph or dependency matrix. The screenshot below shows the 7 namespaces completely entangled. It doesn’t look like the typical image of a circle cycle. What matters is given any of the following namespaces A and B, A can be reached by B and vice-versa. Clearly, such entangled code isn’t something easy to maintain.

(Click on the image to enlarge it)

Let’s have a look at the CQLinq code rule body Avoid namespaces dependency cycles. We can see it starts with a lot of comment describing how to use it. This is a good opportunity to communicate with the user through comments and C# code. I have no doubt, thanks to the upcoming Roslyn compiler as services, proposing short C# code excerpts instead of DLLs or VS projects will become increasingly popular.

// <Name>Avoid namespaces dependency cycles</Name>
warnif count > 0
// This query lists all application namespace dependency cycles.
// Each row shows a different cycle, prefixed with a namespace entangled in the cycle.
//
// To browse a cycle on the dependency graph or the dependency matrix, right click
// a cycle cell and export the matched namespaces to the dependency graph or matrix!
//
// In the matrix, dependency cycles are represented with red squares and black cells.
// To easily browse dependency cycles, the Matrix comes with an option:
// --> Display Direct and Indirect Dependencies
//
// Read our white books relative to partitioning code,
// to know more about namespace dependency cycles, and why avoiding them
// is a simple but efficient solution to architecture for your code base.
// http://www.ndepend.com/WhiteBooks.aspx


// Optimization: restraint application assemblies set
// If some namespaces are mutually dependent
// - They must be declared in the same assembly
// - The parent assembly must ContainsNamespaceDependencyCycle
from assembly in Application.Assemblies
                 .Where(a => a.ContainsNamespaceDependencyCycle != null &&
                           a.ContainsNamespaceDependencyCycle.Value)

// Optimization: restraint namespaces set
// A namespace involved in a cycle necessarily have a null Level.
let namespacesSuspect = assembly.ChildNamespaces.Where(n => n.Level == null)

// hashset is used to avoid iterating again on namespaces already caught in a cycle.
let hashset = new HashSet<INamespace>()


from suspect in namespacesSuspect
  // By commenting in this line, the query matches all namespaces involved in a cycle.
  where !hashset.Contains(suspect)

  // Define 2 code metrics

  // - Namespaces depth of is using indirectly the suspect namespace.
  // - Namespaces depth of is used by the suspect namespace indirectly.
  // Note: for direct usage the depth is equal to 1.
  let namespacesUserDepth = namespacesSuspect.DepthOfIsUsing(suspect)
  let namespacesUsedDepth = namespacesSuspect.DepthOfIsUsedBy(suspect)

  // Select namespaces that are both using and used by namespaceSuspect
  let usersAndUsed = from n in namespacesSuspect where
                       namespacesUserDepth[n] > 0 &&
                       namespacesUsedDepth[n] > 0
                     select n

  where usersAndUsed.Count() > 0

  // Here we've found namespace(s) both using and used by the suspect namespace.
  // A cycle involving the suspect namespace is found!
  let cycle = usersAndUsed.Append(suspect)

  // Fill hashset with namespaces in the cycle.
  // .ToArray() is needed to force the iterating process.
  let unused1 = (from n in cycle let unused2 = hashset.Add(n) select n).ToArray()
select new { suspect, cycle }

The code rule body contains several areas:

  • First, we eliminate as many assemblies and namespaces as possible thanks to the properties IAssembly.ContainsNamespaceDependencyCycle and IUser.Level. Thus, for each assembly that contains namespace dependency cycle(s), we keep only what we call the set of suspect namespaces.
  • The range variable hashset is defined and used to avoid showing N times a cycle made of N namespaces. Commenting on the line where !hashset.Contains(suspect) shows N times such cycle.
  • The kernel of the query is the two calls to extension methods DepthOfIsUsing() and DepthOfIsUsedBy(). These two methods are pretty powerful since they each create a ICodeMetric<INamespace,ushort>object. Basically if A depends on B depends on C, then DepthOfIsUsing(C)[A]equals 2, and DepthdOfIsUsedBy(A)[C] equals 2. Basically, a dependency cycle involving the suspect namespace A is detected if, there exist one or several suspect namespaces B where DepthOfIsUsing(A)[B] and DepthOfIsUsedBy(A)[B] are both non-null and positive.
  • Then we just need to build the set of namespaces B, and append them the namespaces A, to get the complete cycle involving A.

Cutting off the Cycles

While we have a powerful way to detect and visualize namespace dependency cycles, we are still stuck when it comes to define exactly which dependency must be cut off to get a layered code structure. If we look at the graph screenshot above, we can see dependency cycles are mostly the result of pairs of namespaces being mutually dependent (represented by double headed arrows in the graph). The first thing to do when one wishes to get a layered code structure, is to make sure there are no mutually dependent components pairs.

This is why we’ve developed a CQLinq code rule named Avoid namespaces mutually dependent. Not only does this code rule exhibit all such pairs, but for each, it gives a hint about which direction of the bi-directional dependency should be cut off. This hint is inferred from the number of types used. If A is using 20 types of B and B is using 5 types of A, odds are B shouldn’t use A. That B is using 5 types of A is certainly an accidental result of a developer who didn’t know the code base well. This touches at the root of code structure erosion.

Empirically, when A and B are mutually dependent, you’ll see very often there is a natural direction to cut-off. This is because the number of accidental dependencies created hopefully remains low. Nevertheless, letting the number of such minor accidents grow, without fixing them, leads to the typical spaghetti code base we see in most of enterprise.

Concretely, here is the result of our code rule applied on System.Core.dll. We see this assembly contains 16 pairs of namespaces mutually dependent. We also verify what we’ve stated above: most pairs present an asymmetrical ratio of typesOfFirstUsedBySecond and typesOfSecondUsedByFirst:

(Click on the image to enlarge it)

The body of the CQLinq code rule is shown below. There are similarities to the code rule presented above. If you’ve followed the explanation of the previous code query, and have notion of C# syntax, understanding the code of this rule is trivial.

// <Name>Avoid namespaces mutually dependent</Name>
warnif count > 0
// Foreach pair of namespace mutually dependent, this rule lists pairs.
// The pair { first, second } is formatted to show first namespace shouldn't use the second namespace.
// The first/second order is inferred from the number of types used by each other.
// The first namespace is using fewer types of the second.
// It means the first namespace is certainly at a lower level in the architecture than the second.
//
// To explore the coupling between two namespaces mutually dependent:
// 1) export the first namespace to the vertical header of the dependency matrix
// 2) export the second namespace to the horizontal header of the dependency matrix
// 3) double-click the black cell
// 4) in the matrix command bar, click the button: Remove empty Row(s) en Column(s)
// At this point, the dependency matrix shows types involved into the coupling.
//
// Following this rule is useful to avoid namespaces dependency cycles.
// More on this in our white books relative to partitioning code.
// http://www.ndepend.com/WhiteBooks.aspx


// Optimization: restraint application assemblies set
// If some namespaces are mutually dependent
// - They must be declared in the same assembly
// - The parent assembly must ContainsNamespaceDependencyCycle
from assembly in Application.Assemblies.Where(a => a.ContainsNamespaceDependencyCycle != null && a.ContainsNamespaceDependencyCycle.Value)

// hashset is used to avoid reporting both A <-> B and B <-> A
let hashset = new HashSet<INamespace>()

// Optimization: restreint namespaces set
// If a namespace doesn't have a Level value, it must be in a dependency cycle
// or it must be using directly or indirectly a dependency cycle.
let namespacesSuspect = assembly.ChildNamespaces.Where(n => n.Level == null)

from nA in namespacesSuspect

// Select namespaces mutually dependent with nA
let unused = hashset.Add(nA) // Populate hashset
let namespacesMutuallyDependentWith_nA = nA.NamespacesUsed.Using(nA)
      .Except(hashset) // <-- avoid reporting both A <-> B and B <-> A
where namespacesMutuallyDependentWith_nA.Count() > 0

from nB in namespacesMutuallyDependentWith_nA

// nA and nB are mutually dependent
// First select the one that shouldn't use the other.
// The first namespace is inferred from the fact that it is using less types of the second.
let typesOfBUsedByA = nB.ChildTypes.UsedBy(nA)
let typesOfAUsedByB = nA.ChildTypes.UsedBy(nB)
let first = (typesOfBUsedByA.Count() > typesOfAUsedByB.Count()) ? nB : nA
let second = (first == nA) ? nB : nA
let typesOfFirstUsedBySecond = (first == nA) ? typesOfAUsedByB : typesOfBUsedByA
let typesOfSecondUsedByFirst = (first == nA) ? typesOfBUsedByA : typesOfAUsedByB
select new { first, shouldntUse = second, typesOfFirstUsedBySecond, typesOfSecondUsedByFirst }

Once you have eliminated all pairs of namespaces mutually dependent, there are chances the first code rule still reports dependency cycle. Here you’ll face cycles made of at least 3 namespaces entangled in a cyclic A depends on B depends on C depends on A relationship. This sounds painful, but in practice such cycles are often easy to break. Indeed, when 3 or more components are involved in such cyclic relationship, it is generally trivial to determine which one is at lowest level. This will tell you the location of which dependency to cut-off.

Conclusion

  • It is exiting to have these two powerful code rules to detect namespace dependency cycles and have hints about how to break them.
  • Second, and this is what I really enjoy, we’ve added these powerful features through two single textual C# code excerpts, easy to read, write, share and tweak. NDepend does the job of compiling them and executing them instantly, and presents the result in a browsable and interactive way. Technically speaking, we can now add a brand new feature that a user is asking for in a few minutes (we already propose 200 such CQLinq code rules). And, even better, the user can develop their own features!

About the Author

Patrick Smacchia is a French Visual C# MVP involved in software development for more than two decades. After graduating in mathematics and computer science, he has worked on software in a variety of fields including stock exchange at Société Générale, an airline ticket reservation system at Amadeus as well as a satellite base station at Alcatel. He also authored Practical .NET 2 and C# 2, a book about the .NET platform conceived from real world experience. He started developing the tool NDepend in April 2004, to help .NET developers detect and fix all sorts of problems in their code. He's currently the lead developer of NDepend and sometime find a time slot to enjoy diving into the wild areas that the world still offers.

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Sorry, this reads like an infomercial by Stephen Anderson

Don't get me wrong, I love NDepend, and I think there is some value in what you're saying, but you're proposing an approach that is not an absolute good, but a bodge to work around deficiencies in the .NET tool chain. This bodge in turn robs you of one of the advantages of assemblies during development, which is that circular dependencies are obvious (without any extra tool support). But that's OK, because you have a solution, it's called NDepend.

If Visual Studio's dependency management was more sane, then down to a certain size of assembly, multiple assemblies should in general make compilation faster, not slower.

Re: Sorry, this reads like an infomercial by Patrick Smacchia

>If Visual Studio's dependency management was more sane

But it is not, and it is not even close.

And it is not only about Visual Studio slowness (compilation, solution loading time...), but also about the number of assemblies. The code base of NDepend has 306 namespaces. If Visual Studio compilation on many projects was fast, would we be happy to deliver 306 .dll assemblies? Certainly not! The deployment process would be highly error prone.

Of course we could still use tools like ILMerge. Or, we could also embed numerous assemblies as resources in a main assembly. Such a solution should be combined with post build processing (signing, obfuscation, packaging...).

Any palliative solution comes with their own disadvantages. The strength of the solution proposed in the article, is that it works on a "logical" level (namespaces) while all other solutions work on a "physical" level (assemblies, file, merging...).

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT