Why do we have systems with millions of lines of code?
Every so often you’ll hear about someone who starts their new job only to discover that the system they’re now maintaining consists of many millions of lines of code. Millions. Now, are you telling me that these systems need a million lines of code to do what they do? It’s unlikely. The more likely theory is that these systems were never properly designed or refactored and grew through of series of endless hacks.
To be clear, there are systems that are huge through good design and do warrant so much code (operating systems, platforms etc) but I’m not talking about those here.
It helps to think of it like this: Every line of code in a system is a debt. Every line of code, while facilitating some overall functionality, has some implications:
When you write a line of code, if that line is not directly conveying the intention of the unit then it is meta and it’s a source code leak. It’s equivalent to going off on a tangent in a conversation – it makes the conversation harder to understand. The programming language plays a big role in this issue too. What does the language or runtime provide to help you state your intention in less code?
Here’s an example. We have a collection of user objects and we’d like to know how many of them were created today. We could do this:
The problem is that we’ve got 5 lines of code that combine to produce the desired functionality. Every one of the 5 lines needs to be analysed by the reader to understand the purpose of the code. Having to declare a variable to hold the running total and construct a loop which iterates over each element of the list is just background noise. We shouldn’t have to write those things – the source code is leaking.
What if we wrote it like this:
There’s no question that the intention of this code is clearer. I don’t have to take a mental note that there’s a total variable that will be modified in some way. I don’t have to work out what is happening inside a loop to understand what the purpose of the loop is. I don’t even need to know there is going to be an iteration.
Often it’s just not possible to express your intent in a single line, but it’s important to remember that every single line of code is a line of code that you’re going to have to read in 6 months time and understand why it exists. Every line is a debt. The more debt a system has, the more expensive it is to maintain it.
So, my advice is this: Be constantly aware of how much your code is leaking its intention and try to minimise it where you can. Make use of language facilities that let you write less code. Keep asking yourself “If I got handed this codebase on the first day I started, what would my reaction be?” Who knows, maybe you’ll avoid the next multi million SLOC project.