BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles 5 Configuration Management Best Practices

5 Configuration Management Best Practices

This item in japanese

Bookmarks

There has been a lot of conversation going on around the configuration of applications, and how you manage it. Tom Sulston, a colleague of mine at ThoughtWorks, and myself started the ESCAPE project as one way to address configuration management from outside of the application space. What it does is provide configuration for multiple applications in many environments as a REST service. Although there has been no real activity on ESCAPE recently, it’s not dead or forgotten – it’s just that our real jobs have gotten in the way (again).

Today I’d like to have a look at things people can do from within their code to make their lives, and the lives of anyone else who has to administer or maintain their application, easier. These are patterns that I (and others) have used a number of times on ThoughtWorks projects that have proven their worth.

Single Configuration Source

In far too many applications I’ve seen configuration accessed in an implementation specific way from all over the code base. This not only results in confusion as to where specific aspects of the application are configured, but this confusion is frequently compounded by the same configuration parameter name (say database.host) having a different meaning depending on its location. Additional side effects include:

  • Difficulty identifying used or unnecessary configuration options
  • Different parts of the code using different mechanisms to access the same configuration source
  • Using different configuration sources for the same value

From an operations perspective, one of the worst side effects of such a system is that the different sources of configuration often then have different formats such as XML in some files and key/value pairs in others. All this accidental complexity also makes the applications very hard to deploy into new environments.

In systems such as this we also often find that there are either no tests around this configuration code, or the tests that do exist are brittle and/or not realistic.

Therefore:

Encapsulate the actual mechanism for getting at stored configuration into a provider and just inject this provider where values are needed. This also allows the use of test specific implementations of the configuration provider. It allows the way the configuration is stored to be easily changed as the system evolves. For example you could start out with hard coded strings, then move to a file and then finally move some values to a repository of some sort.

For example, take this simple Python class that acts as a dictionary of hard coded values:

class ConfigProvider(dict):
    def __init__(self):
        self['name'] = 'Chris'   

class ConfigProvider(dict):

This simple class can then be used as follows:

from ConfigProvider import ConfigProvider 

class ConfigProviderUser:
    def __init__(self, cfg):
        self.cfg = cfg
        print "Hello, my name is %s" % self.cfg["name"]

if __name__ == "__main__":
    ConfigProviderUser(ConfigProvider())

But then we decide that we must stop hard coding, and decide to read from a .properties. The ConfigProvider then becomes the only bit of code that needs changing, and ends up looking like this:

class ConfigProvider(dict):
    src = None
    prop = re.compile(r"([\w. ]+)\s*=\s*(.*)")
    
    def __init__(self, source = "config.properties"):
        self.src = source
        self.loadConfig()
     
    def loadFileData(self):
        data = ""
        try:
            input = open(self.src, 'r')
            data = input.read()
            input.close()
        except IOError:
            pass

        return data

    def loadConfig(self):
        for (key, val) in self.prop.findall(self.loadFileData()):
            self[key.strip()] = val 

Consider the work that would now need to be done if it was decided that .properties were not good enough, and that we wanted to switch to .yam files. Once again, the only code you need to change is the ConfigProvider itself. Here it is in a transition state where it will happily deal with both formats (based on file extension):

class ConfigProvider(dict):
    src = None
    prop = re.compile(r"([\w. ]+)\s*=\s*(.*)")

    def __init__(self, source = "config.properties"):
        self.src = source
        self.loadConfig()

    def loadConfig(self):
        if self.src.endswith(".properties"):
            self.loadPropertiesConfig()
        elif self.src.endswith(".yaml"):
            self.loadYamlConfig()

    def loadFileData(self):
        data = ""
        try:
            input = open(self.src, 'r')
            data = input.read()
            input.close()
        except IOError:
            pass

        return data

    def loadPropertiesConfig(self):
        for (key, val) in self.prop.findall(self.loadFileData()):
            self[key.strip()] = val 

    def loadYamlConfig(self):
        entries = yaml.load(self.loadFileData())
        if entries:
            self.update(entries)

Single Configuration Ruleset

Too many applications, even small simple ones that don't have an external configuration file and take all their configuration from the command line, fail to adequately inform users about configuration rules. These rules may be (but are not restricted to):

  • What are all the configuration properties that can be set?
  • Which configuration properties are required and which are optional?
  • Is it possible to check that the value provided for a property is valid?
  • Are there defaults values and if so where are they?

Often this is because these rules are simply implicit side effects of code behaviour. The application will usually start up and appear to be functioning normally, but when a user tries to exercise functionality that requires a missing or invalid configuration value you get unexpected results. Verifying that the deployment of such an application is successful is time consuming and error prone.

Therefore:

Define a single ruleset that defines all the above mentioned points. This single source of truth can then be used to generate configuration templates if appropriate for your application. This works well for formats that support schema validation (such as XML), but can still be applied to systems as simple as properties files where you simply generate a tokenised example file.

A Single Configuration Ruleset can then be used as part of a Deployment Configuration Smoke Test. If required configuration elements are missing at application initialisation then fail fast and fail loudly. Don't wait until the application tries to read that value. If we know more information on how to check if a provided value is valid (it's easy to test if a value is an integer, or a file that exists, or a hostname and port option to open a socket on) then test this here too.

This provider must be unit tested. These tests should also run against any template that’s used by external people and/or systems to configure your application. The Deployment Configuration Smoke Tests should be used as early as developer unit tests. If there is a new configuration option added, there should be a unit test for that option. When someone updates their codebase, if they’ve not defined that value on their workstation they’ll have a test that fails, loudly telling them “I expected configuration entry sheep to be defined, but it was not!”

Although the Single Configuration Ruleset must make heavy use of the Single Configuration Source, remember that they are separate concerns. Care needs to be taken to prevent leakage between the two.

Carrying on with the Python example we started with in the previous pattern, we would now have a Configuration Ruleset that looked like this:

class ConfigRuleset(dict):
    defaults = { 
        'name': 'no name',
    }   

    required = [ 
        'name',
    ]   

    def __init__(self):
        self.update(self.defaults)

    def validate(self):
        missing = []
        for key in self.required:
            if not self.has_key(key):
                missing.append(key)

        if len(missing):
            raise KeyError("The following required config keys are missing: %s" % missing)

The Configuration Provider would again be the only code that changes. It would now look something like this:

class ConfigProvider(ConfigRuleset):
    src = None
    prop = re.compile(r"([\w. ]+)\s*=\s*(.*)")
        
    def __init__(self, source = "config.properties"):
        ConfigRuleset.__init__(self)
        self.src = source
        self.loadConfig()
        self.validate()

    ….

The defaults and required structures in the ConfigRuleset then become the single source of truth in our code for what our default values are and which keys are required.

Configuration View

When trying to diagnose problems on a running application you usually need to see what the current running configuration values are. Simply looking at the current configuration source does not provide you with accurate information as it may have changed since the application last loaded it.

Therefore:

Provide an easy, well known way for anyone to find out where a running system loaded its configuration from, and what the values it loaded were. This may be as simple as printing out the configuration tree (and source location(s)) at startup, although this can be lost quickly in long running systems. A more robust approach is for there to be some kind of Web Page/About Page/Remote Procedure Call that returns the current run time configuration, along with where these values were loaded from (ESPECIALLY if there are multiple possible sources of configuration).

It is also often very useful for this view to provide version/build/release information for the system. More information on the value of this is available in my previous article on Self Identifying Software.

In the Python example we've been working with so far implementing this would be to simply return the string representation of our ConfigProvider.

DNS Service Names

It is now generally accepted that it is bad practice to use raw IP addresses when configuring service endpoints. The use of DNS names is now almost (although unfortunately not totally) universal. People still have problems managing such systems though when these DNS entries point to specific server host names. This may work out fine when initially deploying the application, but what happens if you need to perform a hardware upgrade for one of your services? Consider the following simplistic scenario:

You've got a fairly busy web site that uses a central database for client information. This database is also used by the marketing team in some of their applications as well as some other reporting tools. Business is going well, and this server (let's call it db02) is now having some performance issues and needs to be upgraded with a nice new shiny server (let's call it db04). This becomes a long a painful process because you need to find all the applications that use this database and figure out how to reconfigure them to use the new server when the cutover happens.

Therefore:

Use DNS Service Names for all your services. The simplest solution is to use DNS CNAME records for your service endpoints. In the above example, we would create a CNAME record called clientdb that pointed to db02 and all applications that used that database would be configured to use clientdb as the service endpoint hostname. When it came time to move the database to a new server, the final step in the cut over plan would be to simply update that CNAME entry and point it to db04. This not only removes the need for changing configuration on the dependant applications, but it also provides a handy backout strategy - if for some reason there's a problem with the new db04 server, simply change the CNAME back to pointing at db02 until the problem is resolved.

DNS Based Environment Determination

Using DNS Service Names as described above can have a slight side effect. If you had a number of different client databases for development and testing you could end up with a large number of CNAME entries that look slightly different. One example might have:

  • clientdb.example.com for Production
  • clientdb-perf.example.com for Performance Testing
  • clientdb-qa.example.com for QA
  • clientdb-dev.example.com for Development

This confusion then often leads to proliferation of configuration files for each environment.

Therefore:

Use DNS Based Environment Determination for your servers. Do this by initially splitting your top level domain into a number of sub domains depending on their function, and then creating DNS Service Names in each of the sub domains pointing to the relevant server for that service. Based on the list above we would then have:

  • clientdb.prod.example.com for Production
  • clientdb.perf.example.com for Performance Testing
  • clientdb.qa.example.com for QA
  • clientdb.dev.example.com for Development

Servers then resolve entries in their relevant sub domain by function. That is, all QA servers would first resolve entries in qa.example.com first and then if that lookup failed they would try example.com. This allows you to have a single configuration entry for your client database hostname (clientdb) that would resolve correctly in all environments. This technique has the added advantage of still having global services defined in a common top level domain.

Rate this Article

Adoption
Style

BT