Home » Software Dev

Source Code Leaks

28. January 2010 by Martin Rue 2 Comments

Why do we have systems with millions of lines of code?

leak Every so often you’ll hear about someone who starts their new job only to discover that the system they’re now maintaining consists of many millions of lines of code. Millions. Now, are you telling me that these systems need a million lines of code to do what they do? It’s unlikely. The more likely theory is that these systems were never properly designed or refactored and grew through of series of endless hacks.

To be clear, there are systems that are huge through good design and do warrant so much code (operating systems, platforms etc) but I’m not talking about those here.

It helps to think of it like this: Every line of code in a system is a debt. Every line of code, while facilitating some overall functionality, has some implications:

  • Every line adds to the perceived complexity of the unit of code.
  • Every line plays a part in hiding the intent of the unit of code.

When you write a line of code, if that line is not directly conveying the intention of the unit then it is meta and it’s a source code leak. It’s equivalent to going off on a tangent in a conversation – it makes the conversation harder to understand. The programming language plays a big role in this issue too. What does the language or runtime provide to help you state your intention in less code?

Here’s an example. We have a collection of user objects and we’d like to know how many of them were created today. We could do this:

The problem is that we’ve got 5 lines of code that combine to produce the desired functionality. Every one of the 5 lines needs to be analysed by the reader to understand the purpose of the code. Having to declare a variable to hold the running total and construct a loop which iterates over each element of the list is just background noise. We shouldn’t have to write those things – the source code is leaking.

What if we wrote it like this:

There’s no question that the intention of this code is clearer. I don’t have to take a mental note that there’s a total variable that will be modified in some way. I don’t have to work out what is happening inside a loop to understand what the purpose of the loop is. I don’t even need to know there is going to be an iteration.

Often it’s just not possible to express your intent in a single line, but it’s important to remember that every single line of code is a line of code that you’re going to have to read in 6 months time and understand why it exists. Every line is a debt. The more debt a system has, the more expensive it is to maintain it.

So, my advice is this: Be constantly aware of how much your code is leaking its intention and try to minimise it where you can. Make use of language facilities that let you write less code. Keep asking yourself “If I got handed this codebase on the first day I started, what would my reaction be?” Who knows, maybe you’ll avoid the next multi million SLOC project.

Comments

Paul Norman
United Kingdom Paul Norman said:

First up, nice article, well written and informative, but I just want to take a step back from this and consider another point of view, that of a less experienced coder or business owner and additionally in my case a PHP developer (where things are less organised in general!).

Your first code block was immediately clear to me as to it's purpose (although I might have called the method numAccountsCreatedToday), and I can't even use c#... The second one however relies upon a coder's knowledge of an IList object and how the Count method can be applied to it. Even staring at it (knowing it's purpose), I'm not really sure what is going on... I don't even know what 'c' is, perhaps a more descriptive variable name would help? (not saying that it isn't obvious to a c# developer of course).

Now the real issue. If you have a codebase large enough to really worry about this sort of issue you likely have many developers, probably outsourced or contracted, and therefore cannot guarantee their skill level - because lets face it, businesses balance developer cost with profit (guess which wins?) and if you have never worked with an Indian software warehouse take it from me, it's an 'experience.' With that in mind, is there really a business need (not a coding one) to shorten this code? Would any coder be at any point unclear of the original function's purpose? What if you had crammed a more complex statement, carefully worked out onto one or two lines? Will you really remember it's intricacies in 6 months, or will coder X understand them? I am very aware that this was a trivial example to illustrate the point, but the question remains does verbose code actually 'leak' (never heard that before!) and does it actually make the code harder to understand or is this simply a pedantic programming point, applicable only in small cliquey dev teams? In my mind if something is even remotely unclear there should be a comment anyway.

All too often I hear people debate the right way to do something at great length, and if they had just got on with it they would likely have had no problems and made (their company) some money at the same time... To be clear, I'm not advocating terrible code (although I can't claim to be a great coder), but I think there becomes a point where this kind of argument goes too far. From someone who hires coders, if I  got the slightest inclination that someone was wasting my money by trying to produce the shortest possible (likely less readable) code they would be out of the door soooo fast!

So to sum up, I am of course aware of the argument that you make in your article is all about intention, clarity and perceived complexity of code, but my question is does this end up as unclear intention, with actual complexity and no quantifiable benefit to be found?

Martin Rue
United Kingdom Martin Rue said:

My basic point is that you should make use of your language/libraries to write the fewest lines of code possible, while being careful not to be cryptic of course.

Though very trivial, the first example contains more code than it needs to. Excess code means more room for bugs and more time spent writing it.

The second example makes good use of the support provided by the selected technology to reduce the amount of work I have to do as a programmer. I think this is ultimately a good thing for everyone. If it takes less time to write it, less time to debug it and it reduces the surface area of bugs, cost can only be reduced.

I very much agree that there's a fine line between succinctness and cryptic, and professional programmers should know where this line is. My point may have been clearer had I gave an initial example of a linked list implementation - allowing the second example to demonstrate that this can be done quicker and with less code (with more readability) if you make use of the library's linked list.

You're raising the issue of experience here too. If you're hiring professional developers, it's their duty to know how to use their language in the most effective way possible. However, if experience is an issue within your team, then you certainly want to use the subset of your technology that allows your entire team to work effectively - even if this means writing more code.

But to answer your question I think that less code does have a quantifiable benefit, and can be achieved without introducing complexity or obfuscating the intention of the code. Languages evolve to abstract us away from dealing with unnecessary details, as is the case in the next few years with the handling of concurrency for example. I'm sure people debated if statements and for loops when we moved from cmp and jnz instructions too - but we're a hell of a lot more productive because of it.

I'm not suggesting developers should spend company time experimenting in hope of reducing a few lines of code, but developers should know what options are available to them and pick the most suitable one. The option that is equally as readable but 50% less code won't waste your money, it will save it.

You're right about the use of 'c', I've changed it to 'user' to make more sense. Thanks for the thoughtful response Smile

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading