I can’t count the number of times I think I’ve been in this situation. You get that sinking feeling in your stomach. Your neck muscles tense. You clutch at the last shreds of hair you have and rub your forehead with your palms. You look up, and bearing down on you is a mountain. You know the peak well, you know it’s piled high, higher than the clouds you can see. Welcome to spagetti code mountain.
Never ending loops, endless switch statements, functions calling infinite functions, cryptic variables, duplication after duplication, premature optimization, you name it, it’s there.
I’ve been in this situation before, but everytime I come to it, I find myself starting from scratch how to deal with this problem. The feeling of hopelessness when opening up your editor to find a x-thousand line file is overwhelming. My first reaction is almost instinctive to every programmer, control-a, delete and start again. However, this approach is flawed. When looking at code you have to remember, that the person who wrote it, was doing his best in the circumstances he was in, with the knowledge he had. Maybe the specification changed alot, maybe he was going through a tough divorce, maybe he hadn’t been trained in the particular skills need to tackle this problem. Whatever the circumstance, the fact is that there are lessons and behaviours in that messy code that you want to keep and it’s your job to tease them out and clean them up.
The last two weeks, i’ve been getting my hands dirty again, and I though I would share my methodology for refactoring legacy code. So how do you start?
First, you’ll never grasp everything at the beginning, but it’s good to get an idea of the depth of the piece of code you have, how many tentacles it has into different parts of the codebase, which of those tentacles in turn have tentacles? Imagine you were hired to clean a house, you would first go into the house and look in each room. While doing this cursory check you would look for the messiest rooms, the biggest rooms, rooms that come off other rooms, which rooms you could clean easily versus which would take a large effort for a minimal gain. Make a note in a pad with a pen, draw some scratchy boxes with the names of important functions and how they link together, it helps to record your thoughts before you get stuck into the chaos of someone else’s code.
Next, try to take a behaviour and understand it, preferably the most important or most broken one. Ultimately you should approach it from the user’s point of view, ‘As a user when I do X, then Y should happen.’ so start from X and trace it back, what does X do, what does it fire, how does it get there?
This stage is just a spike, so start experimenting. If I comment out this line, does it stop doing the thing I expect it to? Don’t spend too long on it, just to get a feel that you can and have some sort of control of the codebase, and even though it’s still monstrous you can begin to make it make some sense and get a feel for how it’s put together.
So, I just want to say now YOU ARE NOT READY TO REWRITE ANY CODE YET. DON’T EVEN THINK OF PERMANENTLY MODIFYING the code base.
When you’ve got a feel for the path through the code you are going to focus on, then you can prepare for the next stage. If you have made any experimental changes, REVERT THEM NOW. As they will only serve to confuse you later on.
The next stage is tests. Every time you modify the codebase, it’s like walking a tightrope. Below the tightrope is a deep chasm with spikes and rivers with hungry snapping crocodiles in, which if you fall into you will struggle to climb out of. Every automated test you write is like creating a string in a net designed to catch you when you fall, allowing you to climb out faster.
Your first test should be acceptance tests. Take the user flow you’ve just spiked, go get a QA and a Designer/IA, sit down to hammer out some acceptance criteria and write a test in Jbehave or Cucumber or one of the many other BDD testing solutions available to the modern programmer, check it works and add it to your automated build. There, you’ve just created your basic safety net, it’s not indestructible and is pretty low down, but it saves you from the crocodiles.
Now personally, opinions differ on what to do next, but my preference is at this stage to do some tidy up.
Now what I mean by tidy up, is very specific, it’s changes to the code that don’t affect the actual execution of the code. So formatting basically, fixing whitespace issues, line breaks, tab indents for example are a good thing to cover at this stage. Removing unhelpful comments is also good, and finally, I like to go through, method by method, variable by variable and check for dead code that’s never called (a good editor is vital at this stage to make sure you don’t remove something referenced in anther file in your project). You can also start to mark things that you think are worth refactoring. It’s a bit like going into the room you are about to clean and removing all the objects that don’t belong there and will be thrown away, as there’s no point in cleaning something that’s going in the rubbish bin.
When that’s done, run the tests, if you’ve not changed anything that affects the execution of the code, everything should be fine, behave how you expect it to and pass so you can now commit your tidied code.
Finally we can start to think about making some changes, but not so fast my tightrope walking friends. You forgot your old buddy the unit test. For every method you touch, write a unit test to test the existing functionality. Your unit tests are another safety net under the tighrope, but this net is much closer to the tightrope, so if you fall off you can litterally just jump back up onto the rope and carry on, not like the acceptance tests, where you have to climb back up the wall of the chasm.
Then once your methods test is in place, then you can think about breaking it and refactoring that method. First thing I like to do is go through and look at all the variable names, do they make sense, can I by reading the method understand exactly what it’s doing? Is it good english? Does it make sense? Having a pair is vital at this stage to bounce language off and make sure what you write is understandable for someone else.
Now, your finally ready to do the majority of the damage, and you can begin to decimate your method and reduce it to the bare essentials (still readable obviously) eliminating all the code detritus and reducing the entropy. Once your happy, run both your set of tests, commit it and move to the next method. Congratulations you made it over your first pass of the tightrope and you didn’t end up lost in the chasm of code chaos.