Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Storing Code in Queryable Data Structures?

Storing Code in Queryable Data Structures?

This item in japanese

Is today’s mainstream use of flat files the optimal way to represent code? Several discussions occurred in the blogspace in reaction to Rick Minerich’s post advocating for moving away from this paradigm.

He argues that representing code in flat files does not allow to structure code in the most appropriate way. Both the order of functions or classes in a file and the folder organization of a program depend on arbitrary choices of programmers and reflect the ideas they have about structuring code and expressing its meaning. However “no two programmers think identically alike” and as soon as source code involves several contributors, the structure risks to be modified thus loosing coherence at each separate level – code structure within files and folder structure of the program – and between the two.

Even though solutions exist to reduce these risks – e.g. separating out things into as many files as possible, marking out regions of code - Rick Minerich believes that these solutions offer only a partial response to the issues he raised because “they are anchored to flat files”.

Moreover, in some cases it may be interesting to have “a different ordering/meaning […] for a particular task”, but it is rather unconceivable to reorganize code represented by flat files for each separate task.

To respond to these issues, Rick advocates for a different approach to code representation:

If you can treat the reflected code from a programming language like an abstract data structure, why can’t you just keep the source itself in a similarly abstracted data structure? Isn’t the structure of a program more similar to a graph, than a list?


If we kept our code in queryable data structures it would be easy to lay our environment in any way we chose. […] You could also, for instance, show a method and everything which references it. The possibilities for code visualization are limitless.


The real boon of moving on is the power and understanding we will gain from being able to visualize the structure of our programs in any way we choose.

Rick’s post triggered many reactions. Steve Hawley shares his viewpoint and suggests using LINQ for supporting the query based approach to code:

It strikes me that the process of figuring out which variables you're touching when you're compiling a line of code is really a database query.  Scoping and the semantics of scoping are part of the query (as well as how the database has been built).

Further, the actual link of a completed compile (whether or not it's being done at build time or run time), is another query.

The process of compilation should really be the process of building up a database.

Several commentators, however, drew attention to the fact that such approach to code already exists. Keith Braithwaite argues, for instance, that “the logical conclusion of what [Rick Minerich is] talking about is an image-based environment” which exists in Smalltalk and Lisp. Along with Smalltalk, another commentator gives the example of Visual age suite where “all code source is stored in a code database […], and you can query it any which way you want”.

However, Steve Hawley along with other bloggers stresses that one should not dismiss advantages of using flat files. They allow efficient navigation through code since humans “are very aware of space and spatial layout of things and this translates naturally into flat files” so that people “develop a familiarity with the layout of a file and can navigate very efficiently to the right location within it via muscle memory”. 

In the discussion occurred on Reddit, one of commentators argues that what Rick Minerich considers to be a flaw of flat files, i.e. the arbitrary structure, may be considered as a benefit because this flexibility in defining the structure is used to express meaning:

Things like the number of spaces between operators can be used for nice stuff like laying out bits of consecutive lines that have parallel meaning so that they line up. Ordering of functions can be chosen so as to tell a narrative. People have grown quite creative in using the tools they have to write expressive code. If you're going to take this away, I expect to see a good reason to believe that it can be replaced by something equally effective.

Keith Braithwaite reminds the fact that getting away from flat file representation would also mean saying goodbye to text editors and tools for doing version control and he believes programmers are not ready to pay this price. Another commentator, JSJ, speaks about even a larger tools set that would not be usable with image-based formats without being written into the system and stresses that the ability to “build a toolset and use it for ALL (or most) of your programming languages” is “a huge win for flat files”.

The issue of tooling was also raised by Rick Minerich himself who argues that one of the reasons why flat files are still used lays in the fact that all the tools have been built for flat files structured code. Almost all compilers, for instance, require having a complete program. He believes that “a language which is not tied to traditional compiling and linking would be ideal for research into keeping code in abstracted data structures” and suggests a first step solution for supporting query based code:

A good first step would be an IDE/Editor that can manage all of the code in a database and allow the programmer to dynamically construct queries to build views and otherwise manipulate the code. The environment could then generate flat files in order to be compatible with current compilers.

Rate this Article