BT

Natural Course of Refactoring – a Refactoring Workflow

Posted by Mariusz Sieraczkiewicz on Nov 30, 2014 |

 

.so!changesIntroDuCinG!maYBe,softwaRe,TosO,calLEdprOgRessIve,and.however.cHanGes!Modifies.usuallY,sTRucTureOFmaYbe,hoWEVeRtHecoDE,And!wHaTmaYbecumulatEdhowevEr.anD,mAkes.AnD,,and,the.LeSs!rEAdAblE,aNd.cOdeMAybe.ANd,!and,!Thenumber!of,sO,HoWEvErsodepeNdeNCiES,And.MAybeintErACtIOns!betWeenHoWevEr!differEntsyStem.moDules!inCreasESoThat,iTsO!morE.is.diffiCuLt,To,AnDuNdErstandmodIFy

I bet you find it very difficult to decrypt what is really going on. It is a great metaphor what is it like to deal with messy code.

Current state

Refactoring is not a new technique yet still quite a hot topic. It became an indispensable tool in a programmer toolbox. At least in theory. In practice surprisingly I still see that this practice is abandoned in many teams. “We don’t have time”. “We are not allowed”. I think that one of the problems is a false dichotomy thinking here. Refactoring is not a zero-one decision: do it or not. This is why I differentiate two types of refactoring: everyday refactoring and strategic refactoring.

Everyday refactoring is within a reach of every programmer, can be done in minutes, these are mostly safe, IDE-base automatic refactorings, is done for local health of the code, it is a part of programming practice and there is no excuse for not doing it.

Strategic refactoring is a team longer term effort, requires aligning with iteration planning, generates items in backlog, is a risky activity that requires intensive testing (including good tests suite), is difficult and time-consuming. Check carefully if refactoring gives you enough value (eg. challenging with Feather’s Quadrant).

A tool that can be helpful to evaluate the potential value of refactoring is Feather’s Quadrant (I am not sure if Michael knows that it was named after him).

In order to analyse your code you should consider two dimensions:

  • code complexity (cyclomatic complexity metric);
  • frequency of changes in repository.

Having so your code will be divided into four parts:

High complexity – seldom changes – in most cases it is messy but stable code of the core of the system, so don’t touch it.

Low complexity – seldom changes – simple code not changing much, such as utils or not that often used functionality. Also no need to touch unless you want to do some safe to fail experiments.

Low complexity – frequent changes – if you have such code then you are in heaven, this is what you would desire – quite simple code that you deal with often.

High complexity – frequent changes – this is a place where evil lives, big classes, big methods that are changed often. There parts of code scream for refactoring.

But this kind of clustering has one drawback – it shows you the past, how it was in the past, so it doesn’t guarantee that your guess will be correct. This is why you should challenge results of this heuristic with the business strategic vision – to find out what is likely to be changed in the future.

This is great mental tool but I haven’t seen any production tool supporting this heuristic. On the other hand it shouldn’t be difficult to write script getting this data for your environment.

Natural Course of Refactoring

It is time to get to the point. What is Natural Course of Refactoring? I am sure you are familiar with TDD process: red-green-refactor. It is a very simple process. You are right – practice is a little bit more difficult. But the process itself is easy to understand and use. The same is with Natural Course of Refactoring.

I have been working for years dealing with legacy code and doing “Refactoring to design patterns” trainings. During one of such consultancy projects I realized that there is some kind of pattern occurring very often, a process pattern. What you mostly do while doing refactoring: you try to understand the code, split methods into smaller pieces (smaller methods), then you move them around to find a better place for it using single responsibility response, you introduce patterns when some kind of flexibility is needed and evolve your architecture.

First solution

Let’s assume that you work with code that is a mess. Spaghetti. Big ball of mud. You try to do something with one of those big methods. And this is the place where we start.

To illustrate the process we will use the following example (this is the code that produced the beginning example “.so!changesIntroDuCinG!maYBe,softwaRe,TosO…”).

public class TextManager { 

    private TextManagerHelper hlp = new TextManagerHelper();
    private Random rnd = new Random(); 

    public String convert(String text) {
        String result = "";
        List<TextPart> pts = hlp.convert(text); 

        for (int i = 0; i < pts.size(); i++) {
            if (pts.get(i).getType().equals(TextPartType.WORD)
                    && rnd.nextDouble() < 0.2) {
                if (i + 2 < pts.size()) {
                    TextPart t = pts.get(i);
                    pts.set(i, pts.get(i + 2));
                    pts.set(i + 2, t);
                }
            }
        } 

        for (int i1 = 0; i1 < pts.size(); i1++) {
            if (rnd.nextDouble() < 0.2) {
                pts.add(i1, new TextPart(" ",
TextPartType.NONWORD));
                i1++;
                String[] wds = new String[] { "and", "so",
                        "however", "maybe" };
                int ind = rnd.nextInt(wds.length);
                pts.add(i1, new TextPart(wds[ind],
                            TextPartType.WORD));
                i1++;
                pts.add(i1, new TextPart(" ",
TextPartType.NONWORD));
            }
        } 

        String result2 = ""; 

        for (TextPart prt : pts) {
            result2 += prt.getContents();
        } 

        result = result2;
        result = result.replaceAll("[\\?!-\\.,:;'\\(\\)]", ""); 

        String result1 = ""; 

        for (int i2 = 0; i2 < result.split(" ").length - 1; i2++) {
        if (rnd.nextDouble() < 0.5) {
                char[] chrs = new char[] { '.', ',', '!' };
                int j = rnd.nextInt(chrs.length);
                result1 += result.split(" ")[i2] + chrs[j];
            } else {
                result1 += result.split(" ")[i2] + " ";
            }
        }
        result1 += result.split(" ")[result.split(" ").length - 1];
        result = result1; 

        String result3 = "";
        char[] textArray = result.toCharArray(); 

        for (char ch : textArray) {
            if (rnd.nextDouble() < 0.3) {
                char nch;
                if (Character.isLowerCase(ch)) {
                    nch = Character.toUpperCase(ch);
                } else {
                    nch = Character.toLowerCase(ch);
                } 

                result3 += nch;
            } else {
                result3 += ch;
            }
        } 

        result = result3.replaceAll(" ", ""); 

        return result;
    } 

}

Step 0. Understand it

First do something to understand it. It is many times the most difficult step in the process. Especially when dealing with legacy code. How to understand a five hundred lines method having cyclomatic complexity higher 50? No matter what you do it is always pain. How to relieve this pain? Here you can find some tips.

1. Find somebody who wrote the code or is familiar with it and ask for help.

This would be the recommended solution but usually not doable. Sometime even the author is not able to explain the code.

2. Start reading and comment the code putting some extra tags.

Reading a large piece of code is exhausting and tedious task. You have to process a lot of information and it is easy to get lost. You can find some helpful task tags which can be used for the knowledge gathered from the understanding process. Beware: these comments are temporal and should be removed right after refactoring is finished.

// SMELL a smell code description 

// REFACTOR an idea for refactoring 

// NOTE a comment explaining a part of the code 

A NOTE comments are particularly important in further steps of the Natural Course of Refactoring and are used to extract steps of algorithm.

3. Do some sort of scratch refactoring – a technique described by Michael Feathers in his book “Working with legacy code”. It is a rough refactoring done just for better understanding the code and to be thrown away afterwards.

4. Rename variables and name the conditional to express the intent

Naming is a big power and it usually is the most common source of code smells. You can improve readability of code just changing names.

Our example code may look this way after this step.

// REFACTOR compose method
// SMELL "manager" class name
public class TextManager { 

    // SMELL helper antipattern
    private TextManagerHelper hlp = new TextManagerHelper();
    // SMELL short name
    private Random rnd = new Random(); 

    // SMELL too little specific method name
    public String convert(String text) {
        String result = "";
        // NOTE break text into parts
        List<TextPart> pts = hlp.convert(text); 

        // NOTE swap neighbour words randomly
        // SMELL terrible naming
        for (int i = 0; i < pts.size(); i++) {
            // SMELL broken Demeter law
            // SMELL difficult to read
            if (pts.get(i).getType().equals(TextPartType.WORD)
                    && rnd.nextDouble() < 0.2) {
                if (i + 2 < pts.size()) {
                    TextPart t = pts.get(i);
                    pts.set(i, pts.get(i + 2));
                    pts.set(i + 2, t);
                }
            }
        } 

        // NOTE insert meaningless words randomly
        for (int i1 = 0; i1 < pts.size(); i1++) {
            if (rnd.nextDouble() < 0.2) {
                pts.add(i1, new TextPart(" ", 
TextPartType.NONWORD));
                i1++;
                String[] wds = new String[] { "and", "so",
                        "however", "maybe" };
                int ind = rnd.nextInt(wds.length);
                pts.add(i1, new TextPart(wds[ind],
                        TextPartType.WORD));
                i1++;
                pts.add(i1, new TextPart(" ", 
TextPartType.NONWORD));
            }
        } 

        String result2 = ""; 

        // NOTE concatenate everything
        for (TextPart prt : pts) {
            result2 += prt.getContents();
        } 

        result = result2;
        // NOTE remove punctuaction marks
        result = result.replaceAll("[\\?!-\\.,:;'\\(\\)]", ""); 

        String result1 = ""; 

        // NOTE insert punctuation marks randomly
        for (int i2 = 0; i2 < result.split(" ").length - 1; i2++) {
            if (rnd.nextDouble() < 0.5) {
                char[] chrs = new char[] { '.', ',', '!' };
                int j = rnd.nextInt(chrs.length);
                result1 += result.split(" ")[i2] + chrs[j];
            } else {
                result1 += result.split(" ")[i2] + " ";
            }
        }
        result1 += result.split(" ")[result.split(" ").length - 1];
        result = result1; 

        String result3 = "";
        // NOTE replace upper and lowercase randomly
        char[] textArray = result.toCharArray(); 

        for (char ch : textArray) {
            if (rnd.nextDouble() < 0.3) {
                char nch;
                if (Character.isLowerCase(ch)) {
                    nch = Character.toUpperCase(ch);
                } else {
                    nch = Character.toLowerCase(ch);
                } 

                result3 += nch;
            } else {
                result3 += ch;
            }
        } 

        // NOTE remove spaces
        result = result3.replaceAll(" ", ""); 

        return result;
    }
}

Step 1. Express algorithm

Every method should express its algorithm. And this algorithm is usually not expressed explicitly in badly written legacy code. Considering you have understood the method, now you should be able to split it into algorithm steps using mostly the Extract method refactoring in case of simpler methods and Introduce Method Object refactoring for more messy methods. Your aim is to get the code that speaks to you.

We can visualize this step this way:

This way we have algorithm expressed in the code. Just one look at the code and you can see the steps of the algorithm. Steps should be at the same level of abstraction thus we also extracted removeSpaces() method although it is one-liner. The state of the code we are aiming to is called Composed method. In our case it would look like this.

public class TextObfuscator { 

    private static final char[] PUNCTATION_CHARS = new char[] { '.', ',', '!' };
    private static final String[] MEANINGLESS_WORDS = new String[] { "and", "so",
        "however", "maybe" }; 

    private Random random = new Random(); 

    public String obfuscate(String text) {
        List<TextPart> parts = TextPart.from(text); 

        swapNeighbourWordsRandomly(parts); 

        insertMeaninglessWordsRandomly(parts); 

        String result = TextPart.toString(parts); 

        result = removePunctuationChars(result);
        result = insertPunctuationRandomly(result); 

        result = swapUpperAndLowercaseRandomly(result); 

        result = removeSpaces(result); 

        return result;
    } 

    private String removeSpaces(String result) {
        return result.replaceAll(" ", "");
    } 

    private String swapUpperAndLowercaseRandomly(String text) {
        String result = "";
        char[] textArray = text.toCharArray(); 

        for (char currentCharacter : textArray) {
            if (random.nextDouble() < 0.3) {
                char newCharacter;
                if (Character.isLowerCase(currentCharacter)) {
                    newCharacter = 
Character.toUpperCase(currentCharacter);
                } else {
                    newCharacter = 
Character.toLowerCase(currentCharacter);
                } 

                result += newCharacter;
            } else {
                result += currentCharacter;
            }
        }
        return result;
    } 

    //..
    private String removePunctuationChars(String result) {
        return result.replaceAll("[\\?!-\\.,:;'\\(\\)]", "");
    }
    //.. 

    private boolean shouldBeChangedWith(double probability) {
        return random.nextDouble() < 0.2;
    }
}

I know that the code above is not a perfect one but is more clean than the one before.

Step 2. Extract responsibilities

After extracting many small methods it is quite likely that you have many of them in your class. It is quite likely, especially in legacy code, that some extracted methods are out of class responsibility. It is time to use Single Responsibity Principle here. Move those method to other classes or extract a new one if needed. In this step are aim is to adjust responsibilities and cut the code a little bit.

You may consider following strategies in this step:

  • if there is a private method worth testing (ie. complicated enough) it is likely there is a better place for it in another class. In our case a good example is swapUpperAndLowercaseRandomly() method. Then move it or extract a new class for the method.
  • check if the method name and the class name match. In our example method named isWord() doesn’t match the class named TextObfuscator. It is better fit for TextPart class.
  • use metrics indicating detecting class responsibility violations from LCOM family to find candidate methods to move outside.

After step two we should have more balanced responsibilities in our classes.

And our code would look like this:

public class TextObfuscator { 

    private Obfuscations methods;
    public TextObfuscator(Obfuscations methods) {
        this.methods = methods;
    }    
    public String obfuscate(String text) {
        List<TextPart> parts = TextPart.from(text); 

        methods.swapNeighbourWordsRandomly(parts); 

        methods.insertMeaninglessWordsRandomly(parts); 

        String result = TextPart.toString(parts); 

        result = methods.removePunctuationChars(result);
        result = methods.insertPunctuationRandomly(result); 

        result = methods.swapUpperAndLowercaseRandomly(result); 

        result = methods.removeSpaces(result); 

        return result;
    }
} 

public class Obfuscations { 

    private static final char[] PUNCTATION_CHARS = new char[] { '.', ',', '!' };
    private static final String[] MEANINGLESS_WORDS = new String[] { "and", "so",
        "however", "maybe" }; 

    private Randomness random;
    public Obfuscations(Randomness randomness) {
        this.random = randomness;
    } 

    public String removeSpaces(String result) {
        return result.replaceAll(" ", "");
    } 

    public String swapUpperAndLowercaseRandomly(String text) {
        String result = "";
        char[] textArray = text.toCharArray(); 

        for (char currentCharacter : textArray) {
            double probability = 0.3;
            if (random.shouldBeChangedWith(probability)) {
                char newCharacter;
                if (Character.isLowerCase(currentCharacter)) {
                    newCharacter = 
Character.toUpperCase(currentCharacter);
                } else {
                    newCharacter = 
Character.toLowerCase(currentCharacter);
                } 

                result += newCharacter;
            } else {
                result += currentCharacter;
            }
        }
        return result;
    } 

    // ...
} 

public class TextPart { 

    // ... 

    static public String toString(List<TextPart> parts) {
        String result = "";
        for (TextPart part : parts) {
            result += part.getContents();
        }
        return result;
    } 

    static public  boolean isWord(TextPart part) {
        return part.getType().equals(TextPartType.WORD);
    } 

    public static List<TextPart> from(String t) {
        List<TextPart> prts = new ArrayList<TextPart>();
        Pattern pattern = Pattern.compile("(\\p{L}+|\\P{L}+)", Pattern.UNICODE_CASE);
        Matcher matcher = pattern.matcher(t);
           // ... 

    } 

} 

public class Randomness {
    private Random random = new Random(); 

    public boolean shouldBeChangedWith(double probability) {
        return random.nextDouble() < probability;
    } 

    public int nextInt(int maxInt) {
        return random.nextInt(maxInt);
    }
}

Refactorings that may be helpful in this step: Move method, Extract class, Introduce Domain Object, Introduce Value Object.

Step 3. Introduce flexibility

Just doing steps 0-2 you would heal most codebases on this planet. But of course sometimes you should go further when flexibility is needed. Here are some tips indicating that you may introduce the design pattern:

  • you have implemented many times similar algorithm
  • your requirements state that something may be done in a variety of ways (for example you have to support different file formats, data formats, protocols, third party systems etc.)
  • after you more understand the bigger picture, you can recognize better a nature of object interactions and patterns emerge

It is very handy to know what kind you may be looking for:

  • data are built in many steps (Builder)
  • you domain object has its lifecycle and you can define some sort of state machine (State)
  • you need some form of caching data (Flyweight)
  • you need to flexibly enhance some behaviour (Decorator)
  • you want to hide the real object for some reasons (Proxy)
  • you have different variations on the same algorithm (Strategy)
  • and so on…

In most cases after this step your code would ready for extending (Open-Close principle embedded in many patterns). But beware. There are many situations when you can introduce pattern but there is no rationale to do it. Our example is the case. You can find Builder pattern introduced as an illustration of the next step of the process but let’s think for a while. Do we really need it this kind of flexibility in this context? Builder pattern gives us means to build data in different ways but in our example there is only one way to obfuscate the text and we have no more hints that we have to be flexible here. Then don’t do it. Flexibility has its cost. Don’t pay for something you don’t need.

The most important question here is: “Do I really need this flexibility?” and you should have very strong reasons.

public class TextObfuscator { 

    public String obfuscate(String text) {
        ObfuscationBuilder builder = new ObfuscationBuilder(text); 

        builder.swapNeighbourWordsRandomly(); 

        builder.insertMeaninglessWordsRandomly(); 

        builder.removePunctuationChars();
        builder.insertPunctuationRandomly(); 

        builder.swapUpperAndLowercaseRandomly(); 

        builder.removeSpaces(); 

        return builder.toString();
    }
    // ... reasonable if there are different ways of obfuscating
} 

public class ObfuscationBuilder { 

    private static final char[] PUNCTATION_CHARS = new char[] { '.', ',', '!' };
    private static final String[] MEANINGLESS_WORDS = new String[] { "and", "so",
        "however", "maybe" };
    // ..
    private Random randomness;
    private String text;
    // ..
    public void removeSpaces() {
        this.text.replaceAll(" ", "");
    }
    // ..
}

Mental tools you use in this step: S.O.L.I.D., Design patterns, refactoring to patterns.

Step 4. Evolve architecture

After some time some higher level patterns emerge:

  • You may notice that state machine is better suited for your solution (or its part)
  • You may notice that introducing event style of communication will give you expected system flexibility
  • You may want to extract a read-only subdomain according to CQRS so that expensive queries would be much faster
  • You may introduce or remove layer in you layered architecture
  • You may want to apply some Domain-Driven Design concepts
  • And so on…

Implementing these decisions require a lot of effort and usually cannot be done adhoc. They require longer term planning and are very risky. Thus this step doesn’t occur too often but is needed in order to keep system architecture healthy.

Our example is too small to show this step. It just doesn’t apply there.

This way you should have your architecture balanced just in time. It may require changes in future but now it is reasonably ok. Of course it is a little simplified view of things because evolving architecture is usually takes a long time and balancing architecture is more like a process than a particular state.

The whole process can be depicted this way with the border between everyday and strategic refactoring:

(Click on the image to enlarge it)

Practical hints

Natural Course of Refactoring is just the model. It is simplified view on how to organize refactoring process and what is the least possible step. It tells us how to do it in small steps and so you won’t rush.

Here are some practical hints for using it in real life:

  • It doesn’t also mean that you should always follow the NCR steps in this order. Sometimes you can do two steps at once, you can omit a particular step.
  • Most of your everyday refactoring fall in steps 0-2. It will significantly change readability of your codebase.
  • The further the step is the less frequently it is performed. You do steps 3 and 4 less frequently (and this is so called strategic refactoring).
  • The border between everyday and strategic refactoring is not strict and it is more a matter of effort needed to do refactoring.
  • Legacy code is usually inconsistent so you can find different parts in different NCR phases.

About the Author

Mariusz Sieraczkiewicz has been a software professional for more than 10 years. Passionate about learning from experience especially when dealing with real-life software projects. As a trainer and consultant he woredk with top teams on technical leadership, process agility and clean code. Co-conducted in-depth researches on critical success factors in software development. At top of that he published dozens of articles in Software Developer Journal and Polish “The Programist” magazine. Mariusz has recently presented a talk on the topic of Natural Course of Refactoring

Rate this Article

Relevance
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Excellent one! by Ajay Jadhav

Many people trying to adopt refactoring for a big-messy code wonder how to start. This is great a article providing a good roadmap.

Fantastic article by Richard Clayton

Thank you for the practical examples and diagrams. Printing the last one and posting on my wall.

One of the best explanation of refactoring by Ashkrit Sharma

This is one the best explanation , i am going to share it with my team.

Doubt with unit testing by Juan Pablo Araya

Excellent article, but I have a doubt.

In Michael Feathers book he points that Test Coverings is one of the first thigns to do in order to refactor. That is: first cover all the messy algorithm and assure that your refactoring mantains the same expected result. In your example you have refactored a single method that does not have any dependency. In this case the steps 0-2 are safe enough to refactor without those test coverings? What if the programmer make a mistake in the refactoring phases?

Greetings!

Excellent Article! Just what i've been trying to say. by Leighton Braddock

I love it when other people eloquently put into words what I have been struggling to communicate.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

5 Discuss
General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you’re using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.