On Clarity
Thursday, January 4, 2007
As a software developer, I’ve gained the privilege of looking at countless lines of code, some my own and some inherited from others. Many times, I’m not just browsing for pleasure, but either debugging or doing further development on these fragments of logic, and sometimes the message being conveyed to the compiler is not as clear to a mere human reader.
I’m in no way implying that I’m perfect, and that whatever this article shapes up to be should be the definition, or law of clear code, but the guidelines here are ones that I use, and so far have proven useful to me and those who have chosen to adopt them.
I will subdivide this article into headings describing issues I see as being problematic areas and either proposing a solution, or commenting on a situation where a solution may not be so clear cut.
Paragraphing
Much like in English, blocks of code are not monolithic entities, in several cases a single function or method does several things to accomplish it’s goal, and many times readability increases dramatically if these things are separated into logical sections by whitespace. For a totally out of context example consider the following block:
SomeClass scObject = new SomeClass();
scObject->gatherInformation();
if(!scObject->infoValid()) {
delete csObject;
throw new someException();
}
RandomStaticClass::transformObject(scObject);
for(scObject::someIterator scIter = scObject->begin(); scIter != scObject->end(); scIter++) {
arbitraryFunction(*scIter);
}
return scObject;
Despite the hilarious naming, this simple block of code is pretty simple to understand. It creates an object, calls a method, verifies the information, performs some transformation then iterates something before returning the object to the caller. But in the reading of the code, there needs to be some concentration on behalf of the reader to separate these individual sections mentally before fully understanding their functions within the given context. In my opinion, and further in practice I find code in such close concentration more often than I care to count, and due to my already present mental breaking down of the sections I’ve chosen to adapt a practice of physically separating these sections into paragraph-like blocks based on function, transforming the above code, into the below code:
SomeClass scObject = new SomeClass();
scObject->gatherInformation();
if(!scObject->infoValid()) {
delete csObject;
throw new someException();
}
RandomStaticClass::transformObject(scObject);
for(scObject::someIterator scIter = scObject->begin(); scIter != scObject->end(); scIter++) {
arbitraryFunction(*scIter);
}
return scObject;
The extra whitespace does nothing functionally, but it forces the separation of the logical sections for, in my opinion, easier reading.
Clever Only Commenting
The drive in programming has been for verbose commenting, and although I support this in most cases, such as documentation of methods and very complex sections, I don’t agree with the blanket statement, as the resulting code is far less readable than it could have been if comments were removed all together. I base my theory on the idea that people who write code are familiar with the language in which it was written, as well as with the science of software development in general. Like many people familiar with their field for developers many patterns are recalled instantly, and many algorithms are easily identified by their signature within a construct and any additional verbosity is simply noise surrounding signal.
To illustrate my point in psuedocode once more:
//Declare an integer
int i = 0; //It will be initialized to zero
//Loop 10 times
//Re-init i for style
for(i = 0; i < 10; i++) {
doSomething(i); //Call doSomething() function with the current value of i
} //End of for(0..10) loop on i
//Create a new char pointer of size 10
char *cpBuffer = new char[10];
//Clear the section we just allocated
bzero(cpBuffer, 10);
//Pass the buffer to another method of a static class
//SomeStaticClass will use this buffer for subsequent static calls.
SomeStaticClass::setBuffer(cpBuffer);
//Return the buffer to the caller
return cpBuffer;
Although the volume and content of the comments seems almost comical, this example is not far removed from reality, mostly appearing in examples, but rarely making it to production code as well. My concern though, is not trivial even given these limitations, for if people are learning through such verbose examples there is a chance that it will considered kosher and used in projects outside the realm of hobby.
What I ask, is that the above code be compared to the code below by someone familiar with even the basics of the language:
int i = 0;
for(i = 0; i < 10; i++) {
doSomething(i);
}
char *cpBuffer = new char[10];
bzero(cpBuffer, 10);
//SomeStaticClass will use this buffer for subsequent static calls.
SomeStaticClass::setBuffer(cpBuffer);
return cpBuffer;
Although the code is of trivial complexity, the verbose comments made very little impact on readability. In fact, I’m of the opinion that the latter example is much simpler to understand, and reads much faster than its comment-heavy brother.
The real point is that the second block is not stripped of all comments. The section passing the buffer to a static class is still prefaced with a comment explaining why the action is being taken. In this fictional example, that section of code stands out because its function is not as apparent as its surroundings, and as such it is explained with a simple statement.
This rule is harder to apply on a broader scale for several reasons. The use of “clever” blocks is relative to the skill and project familiarity of the team. Where seasoned developers can easily spot and track complex recursion, it may be considered clever by those who have not dealt with it constantly in a previous project, so the application of this rule may need to based on arbitrary set of standards which set the bar for what behavior is considered “clever” and deserving explanation, and by no means do I advocate the absence of full documentation on the project, including documentation of method actions, returns, and conditions. The statements I make here are touching only on comments made inline to explain small sections of code.
Variable Naming
A classic point, made in every single style guide I’ve ever seen, and yet still seems to be an elusive dream for the majority of functioning code on this planet.
As developers, we are not bound to the rules of algebra and as such are free to use variable names that exceed the scope of single letters. But simply giving a descriptive name does not solve the problem. As seen in earlier examples here, I’m a fan of hungarian notation, but even that can get far out of hand. Although a great idea, the concept of 10-odd letter prefixes are just as frightening as none at all, and as all good things should be used in moderation:
for(unsigned int uiTempIndexVariable = 0; uiTempIndexVariable < 10; uiTempIndexVariable++) {
SomeStaticClass::updateDataIndex(uiTempIndexVariable);
}
As demonstrated, the abuse of this suggestion is more harmful than simply its omission, further the use of Hungarian notation is subject to taste, but my personal reasons revolve mostly around seldom visited regions of code. Given a block of code in which variable declarations are either at class scope, or are off-screen during an edit, type prefixes as well as descriptive names are a very useful reminder as to what can, and should be done with data, and despite the fact that you may be the original author, several months from creation the specifics of code drift from memory.
Yet as to most, there is an exception which I’ve demonstrated with the loop above. I see no point for short-scoped loop indexes, or return stores to have names more descriptive than ‘ret’ or the familiar set of ‘i, j, k, l’ and so forth. If a variable lives no longer than it’s declaration, or simply used in a tight scope the name is often times glossed over in reading, and this process should not be disturbed as it would aide in nothing but the degradation of reading speed with no benefit to clarity.
Useful Commit Messages
As a developer, one easily falls into a deep respect for source control. It will allow you to fix your mistakes, and to work on a team without the utter destruction of sanity, but even that wonderful system can be reduced to a useless heap of data with careless human intervention. As you should be familiar with, every commit to any source control system requires a “Commit Message” or a short blurb describing what changes have been made that made the commit worth happening. And as often the case, shortcuts are taken. Yet unlike all other areas of laziness, commit message shortcuts aid in no way to the reduction of work. Quite conversely, they add work to later cases.
Most reduced commit messages read with the finesse of statements such as “stuff” or “some changes” or even more often, yet equally useless “”, the blank string. But by using these non-informational descriptions work is created further down the line, when it comes time to decipher what “stuff” was changed, or what one was thinking when one changed the code to the tune of “”. And as most humans would agree, a one line message such as “Fixed typo in variable names” is much easier, and not to mention faster to read than a diff.