Safe Refactoring
Refactoring in the land of messy and missing tests is hardly an easy proposition. If your suite isn't tested, you can't refactor, and write good tests without refactoring the code. Seems like you're doomed isn't it? Let's return to our mess of TDD:
server = http.createServer(function(request, response) {
if (request.url === "/thingthatwontexist") {
response.writeHead(404);
response.end();
} else if (request.url === "/") {
redirect(response, '/users/');
} else {
if (request.method === 'GET') {
if (request.url === "/users/") {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '';
models.User.findAll()
.then(function(users) {
var responseText = '';
users.forEach(function(user) {
responseText += '<a href="/users/' + user.id + '/tasks">';
responseText += user.username;
responseText += '</a>';
});
response.write(responseText);
response.end();
});
} else {
self.findAllTasks(request, response);
}
} else {
var parts = request.url.split('/');
var data = '';
request.on('data', function(chunk) {
data += chunk;
});
request.on('end', function() {
var json, params;
try {
json = JSON.parse(data.toString());
} catch(e) {
params = querystring.parse(data.toString());
}
Holy crap that's a big pile of awful, and it's not even the entire program. Your job right now is to write a test that ensures we return a 404 correctly for all paths - not just thingthatwontexist
. You've written Your First test so you don't have to get over that hurdle, but now you've got a refactoring catch-22. What's the solution? This is usually when people quit and write some crummy code just to get "done".
Of course it's much simpler to refactor if you have a good suite of tests but it's not impossible to break this Catch-22, especially if you stick to a limited subset of refactorings.
Cleanliness
You won't this in Fowler's Refactoring or listed on the Portland Patterns Repository, but we've all seen code that looks something like this:
models.Task.findAll({where: {UserId: parts[2]}})
.then(function(tasks) {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '/users/' + parts[2] + '/tasks';
tasks.forEach(function(task) { console.log(task.id); responseText += '<p>' + task.title + '</p>\n'; });
Amazingly some developers won't change it lest they hurt somebody's feelings, but while this code probably works and doesn't have unnecessary complexity it is nearly impossible to read. Remember we write code for humans, not computers, otherwise we'd all be writing assembly code. Let's look at the quick fixes you can do with or without tests.
Inconsistent indentation
Format your code consistently! I don't care if it doesn't match my preferred style, but make the same everywhere. When in doubt let the editor do it. I use Vim as my preferred editor, so it takes me about two second to highlight the entire file and fix the indentation.
models.Task.findAll({where: {UserId: parts[2]}})
.then(function(tasks) {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '/users/' + parts[2] + '/tasks';
tasks.forEach(function(task) { console.log(task.id); responseText += '<p>' + task.title + '</p>\n'; });
response.write(responseText);
response.end();
});
Getting better. I can delete any silly whitespace too. You can run the app to validate you haven't broken this change, and quickly too. Now there is one caveat - don't get into format wars. If your company has a standard, use the standard, and configure your editor to use it.
Scrolling to the Right
Some developers won't go past 80 characters, others 120. I tend to stay somewhere around 100 as my limit to the right. This usually leaves me enough room to have two split windows on a modern monitor. The important thing to do is avoid scrolling right, and to do so without causing the code to have such bizarre formatting nobody can follow it. In this case the tasks.forEach line has an inilne function all one one line. Let's fix that:
models.Task.findAll({where: {UserId: parts[2]}})
.then(function(tasks) {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '/users/' + parts[2] + '/tasks';
tasks.forEach(function(task) {
console.log(task.id);
responseText += '<p>' + task.title + '</p>\n';
});
response.write(responseText);
response.end();
});
Even better. This actually reveals a console.log
call ths is probably not needed.
Suddenly this code just looks better. Of course it isn't actually better because it's the same code and we haven't done any real refactoring, but we've made it far simpler to understand. This only took minutes and is extremely unlikely to break anything, even without a test.
Remove Unused Code
I've been experimenting with the Go language, and one of the nicest things about it is how it makes unused imports and variables a compiler error. Oh I hated this at first, but it's wonderful to know that there's not a single Go program with an unused variable or dependency. You should purge the unused code like it's illegal. Unused code should never make a test fail unless it's an unneeded test. This includes unused variables, imports, and funtions. Even entire modules.
Delete Commented Code With Fire
I'm not a huge fan of comments but this book is not meant to take a side on that, other than to say that good tests beat comments any day of the week. On the other hand commented code is lazy, ugly and stupid. Destroy it with extreme prejudice. You may need it someday? That's what source control is for my friend.
Start Linting or Hinting
Finally get yourself a linter. JSLint and JSHint are both extremely popular for JavaScript and you should pick one and use it. I have JSHint hooked into vim so that every time I save the file I get immediate feedback from the linter. Let's take a look at our app in my editor:
See those arrows on the left? JSHint (in combination with Vim) is telling me there is something wrong with line 20. The error is at the bottom of the screen:
You can see I've got a mistake I've used an ES6 feature in ES5 code. I can fix that - again without perfect tests - because I have a different test for it. The linter. It's true that sometimes you'll get a false positive from a linter, but the gains far outweigh the occasional inconvenience. Ignoring linting because of the occasional false positive is like turning off spellcheck because it can't spell refactoring. It's not throwing the baby out with the bathwater, it's throwing out the baby, the bathwater, the bathtub and never taking a shower ever again.
The Big No No
A few years ago I switched from C++ to Ruby. Ruby uses two spaces for indentation as its default. When I returned to C++ four spaces looked GIGANTIC so I switched to two spaces everywhere, regardless of language. Well that's fine as long the rest of the team is also using two spaces, but indentation wars across the team are counter-productive and pointless. When I work on teams that use four spaces, I use four spaces.
Be considerate to other people. There's a difference between code that's ugly and code that is a style you don't like. Teams sometimes have a written coding standard but they always have an informal collective standard. Stick within the guidelines of both and you'll be fine.
Actual Refactorings
The previous changes can be done without tests, because you haven't changed behavior. The next changes require some tests athough hardy perfect ones. These are refactorings meant to simultaneously improve readability and testability.
Make Methods Class Methods
This can be done without tests when your code is compiled or if you have a static analysis tool (like JSHint) that will tell you when a method doesn't exist is called. That is taking a method that doesn't use any instance variables and making it a class (or static) level method. So from:
var PsuedoRandomNumberGenerator = function() {
};
PsuedoRandomNumberGenerator.prototype.random = function() {
};
to:
var PsuedoRandomNumberGenerator = {
Random = function() {
};
};
JSHint will tell you where you called random
and it didn't exist, so you can replace it with Random
. This shows the mechanics but why might you want to do this? Well let's look at where findAllTasks is called in app.js:
if (request.method === 'GET') {
if (request.url === "/users/") {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '';
models.User.findAll()
.then(function(users) {
var responseText = '';
users.forEach(function(user) {
responseText += '<a href="/users/' + user.id + '/tasks">';
responseText += user.username;
responseText += '</a>';
});
response.write(responseText);
response.end();
});
} else {
self.findAllTasks(request, response); // HERE I AM!
}
} else {
What if you wanted to change this method? How would you test it? Well you could to through that jumble of if statements but for a small change that isn't worthwhile, but if you look at the findAllTasks
method you'll see this:
findAllTasks: function(request, response) {
var parts = request.url.split('/');
models.Task.findAll({where: {UserId: parts[2]}})
.then(function(tasks) {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '/users/' + parts[2] + '/tasks';
tasks.forEach(function(task) { console.log(task); responseText += '<p>' + task.title + '</p>\n'; });
response.write(responseText);
response.end();
});
},
If you look closely you'll see that no part of that method relies on data in the object. There's no reference to self
. So why have it be a method on this object? It's pretty clearly not designed to be, it's just sort of there like an unpleasant mole. It probably belongs elsewhere, in an object that doesn't exist, but until you've created that object you can make a pretty simple change to make it more testable. Change where it's called:
...
response.end();
});
} else {
findAllTasks(request, response);
}
...
Now change how it's defined by creating a var at the top of the file:
var findAllTasks = function(request, response) {
var parts = request.url.split('/');
models.Task.findAll({where: {UserId: parts[2]}})
.then(function(tasks) {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '/users/' + parts[2] + '/tasks';
tasks.forEach(function(task) { console.log(task); responseText += '<p>' + task.title + '</p>\n'; });
response.write(responseText);
response.end();
});
};
And in order to make that method testable add it to the exports:
module.exports.findAllTasks = findAllTasks;
Now that method is publically exposed and is testable. If you're worried about encapsulation keep in mind this method used no data in the object. In other words it didn't encapsulate anything!
Is this the final state of the method? Probably not. Is anything in it's final state ever? Not in software. So you've made this method just a little bit easier to test, you've made it a little bit more accurate since it was actually static anyway, and you've made it easier to move into a new object when appropriate.
Rename
Renaming things is the single most important refactoring you can do. You can extract methods and classes until the cows come home but until you become comfortable renaming things to fit a new context it's not going to matter. The bad news is that this is the hardest one to get right, indeed no variable has one and only one right name. The good news is that this refactoring can be done over and over again, continually refining and retrying new names until everything makes sense.
Talkin' While You Code
There really is a weird trick you can use to improve your variable names. It might annoy your neighbors, but that's why we have wonderful soundproof cubes right? Let's take an example from a blog post I wrote a while back.
it("is the owning object context in a callback", function(done) {
setTimeout(a.thingToDo, 1);
setTimeout(function() {
assert.equal(a, "thing");
done();
}, 5);
This code was written as a form of flashcard or quiz. Not a separate flashcard program mind you, but a flashcard for me so that I could practice using JavaScript's infamous this
keyword. This test, and a few around it, help me practice. When I get confused or forget something I can quickly make these tests pass or check them for reference. The key point is that there isn't any real application behind this example, it's code for the sake of code.
A few weeks later I revisited this code, and since it has been a few weeks since I wrote it I had to re-figure it out. This isn't particularly surprising or a sign that I'm a terrible coder. Despite our best practices and techniques we all ship our first drafts in this profession, and as you add test after test to a piece of code you start internalizing information that could be expressed in the code itself. Later you come back and realize you don't remember all that stuff.
Let's ignore the description "is the owning object..." for a moment and try to understand this test. Read the test and as you read it start describing to yourself what it does. OUT LOUD. Here's what I did:
Okay so first a set a timeout on thingToDo to happen in one second. What is the thing to do? Why do I set a timeout? Then I set another timeout for five seconds ...oh wait milliseconds ... from now where I check that a equals the string "thing". Why would
a
be "thing" now? Okay wait these callbacks do something like this = 'a' right?
When I talk to myself I say okay a lot. The questions prompted me to look at the code under test.
var a = {
thingToDo: function() {
this.a = "thing";
}
};
Ah okay so this might be set to the global window object, or it might be the object 'a'. I get why I named this variable something generic but surely I could have come up with a better name than a. Let's call this objectUsingThis because that's what I'm testing for. What does thingToDo do - well it sets the property 'a' - well let's call that property - to the string "thing" ugh that's awful. That's an arbitrary value for the property so let's call it property value.
Now that I've talked through the code I understand what it does, and this is where many developers stop. Don't. Embed that information in the test so the next person (probably you) will have to spend less time talking and can spend more time working. To finish this example let's turn to TodoTDD.
server = http.createServer(function(request, response) {
if (request.url === "/thingthatwontexist") {
response.writeHead(404);
response.end();
} else if (request.url === "/") {
redirect(response, '/users/');
} else {
if (request.method === 'GET') {
if (request.url === "/users/") {
response.writeHead(200, {
'content-type': 'html'
});
var responseText = '';
models.User.findAll()
.then(function(users) {
var responseText = '';
users.forEach(function(user) {
responseText += '<a href="/users/' + user.id + '/tasks">';
responseText += user.username;
responseText += '</a>';
});
response.write(responseText);
response.end();
});
This is some pretty gnarly stuff. I'll talk my way through it:
Okay.. the first line creates an http server. I could switch that callback to a promise I bet, but I don't wanna take that on yet. The first first bit checks for 'thingthatwontexist' and explicitly returns a 404, so our 404 logic isn't correct. The next line checks if the first line is the root path and redirects to the users path. Next we check if we have a GET request - there's an extraneous else block there. If it's the users link you write a 200 with ..nothing. Otherwise you ..oh that's not an otherwise... it's just adding the response body. You find all the users, then write every response in a link to the specific user as a straight up anchor tag.
Now I've called out a few methods, a bug, and a few other issues, but I don't feel safe extracting methods yet. Let's just highlight some names I said:
- 404
- thingthatwontexist
- root path
- users path
- GET request
- 200
- response body
- anchor tag
Now let's rewrite that section with the improved naming.
var FILE_NOT_FOUND = 404;
var SUCCESS = 200;
var ROOT_PATH = "/";
var USERS_PATH = "/users/";
var GET_METHOD = "GET";
var TASKS_PATH = "/tasks";
module.exports = {
start: function(port) {
var self = this;
server = http.createServer(function(request, response) {
var FOUR_OR_FOUR_URL = "/thingthatwontexist";
if (request.url === FOUR_OR_FOUR_URL) {
response.writeHead(FILE_NOT_FOUND);
response.end();
} else if (request.url === ROOT_PATH) {
redirect(response, USERS_PATH);
} else {
if (request.method === GET_METHOD) {
if (request.url === USERS_PATH) {
response.writeHead(SUCCESS, {
'content-type': 'html'
});
models.User.findAll()
.then(function(users) {
var responseBody = '';
users.forEach(function(user) {
responseBody += "<a href=\"" + USERS_PATH + user.id + TASKS_PATH + "\">";
responseBody += user.username;
responseBody += "</a>";
});
response.write(responseBody);
response.end();
Better, I beet you can think of more. This bit will make more sense the next time I use it and I can see more opportunities for refactoring.
Advantages of Renaming
Renaming adds clarity to the code by removing a translation layer, the part where you translate the code from what's on the page to an understanding in your head. That part where I said things out loud? That's the translating. It also changes exactly 0 behavior, and can be done with inadequate testing. Indeed in a compiled language you don't need tests at all to to this refactoring! It's a refactoring with virtually 0 risk. Provided you don't go around renaming everything, it's unlikely to hurt anyone. Remember there are only two hard problems in Software Development: cache invalidation, naming and off by one errors. Your first names will be wrong, your second names will also be wrong, but they are likely better.
The Disadvantage - With A Fix
Some developers do not like having their variables renamed on principle. They think of the code as "theirs" and don't want it changed. Furthermore by renaming you're making the code easier for you to understand but you may be making it harder for another developer to understand. What makes sense to me does not always make sense to others.
The general solution to this is to work together on the rename. If you're going through code with somebody who doesn't like it when you change "their" stuff, go over to them and work on a solution together. Perhaps you can come up with names that help BOTH of you understand. Don't be critical. Don't say "your code is shit I don't get it" but come to them for help. Ask them to help walk you through the code and propose new names as you go. Listen. It's really not that hard.
Some people are just jerks, but most aren't. You can work this out.
Two No Nos
Don't rename things in code you're not actively working in. Wandering around the entire codebase renaming all the variables is a not-so-subtle way of telling your co-workers that they are stupid. It wastes your time, your coworker's time and eventually your bosses time as he or she has to mediate childish disputes. The purpose of this is to work faster because you can understand the code you are in better, not to perfect the names of the entire system.
Don't get touchy when people rename your variables. What's good for the geese is good for the gander.
Safe Refactoring
You can refactor with poor or non-existent tests. You can also start refactoring with a passing test that isn't generic enough, like the one we have here. Remember we started these refactorings with this test passing:
it ("returns a 404 for non-existent paths", function(done) {
http.get("http://localhost:8888/thingthatwontexist", function(res) {
expect(res.statusCode).to.equal(404);
done();
});
});
Using this code:
var FOUR_OR_FOUR_URL = "/thingthatwontexist";
if (request.url === FOUR_OR_FOUR_URL) {
response.writeHead(FILE_NOT_FOUND);
response.end();
}
Now that we have the test passing, and we've cleaned up some of the nearby code, we can do a better job of cleaning our new feature. First let's write another assertion to triangulate our test, and force our solution to be more robust.
it ("returns a 404 for non-existent path #{}", function(done) {
http.get("http://localhost:8888/thingthatwontexist", function(res) {
expect(res.statusCode).to.equal(404);
http.get("http://localhost:8888/thingthatalsowontexist", function(res) {
expect(res.statusCode).to.equal(404);
done();
});
});
});
You should see this test fail. I'm beginning to see other tests fail (flaky tests). We'll have to ignore those for the moment. You may notice we're nesting get requests in a test here, and that it's pretty nasty. It is - it's a smell that we shouldn't be doing our testing through http requests. You probably knew that, and we'll get to fixes for that problem soon.
Now let's change our answer to be more robust:
var CREATE_USERS_PATH = "/users";
var GET_METHOD = "GET";
var TASKS_PATH = "/tasks";
module.exports = {
start: function(port) {
var self = this;
server = http.createServer(function(request, response) {
var prefixValid = false;
if (request.url === ROOT_PATH ||
request.url.slice(0, CREATE_USERS_PATH.length) === CREATE_USERS_PATH) {
prefixValid = true;
}
if (!prefixValid) {
response.writeHead(FILE_NOT_FOUND);
response.end();
} else if (request.url === ROOT_PATH) {
Note how renaming the variables made it easier to creating strings that match the criteria of valid URLs vs invalidURLs. Next we'll extract a function:
var isValidPrefix = function (url) {
if (url === ROOT_PATH ||
url.slice(0, CREATE_USERS_PATH.length) === CREATE_USERS_PATH) {
return true;
}
return false;
};
module.exports = {
start: function(port) {
var self = this;
server = http.createServer(function(request, response) {
if (!isValidPrefix(request.url)) {
response.writeHead(FILE_NOT_FOUND);
response.end();
We'll keep this a function outside of the object, at least for now, because this object still doesn't have any cohesive properties to it. You'll see that this isn't perfect at all, but it's better than before. In Real World TDD we have to make incremental improvements as we go.
Before we continue refactoring app.js we've got a new feature to implement, and that's gonna force us to deal with the database.