Extra! Extra! TDD Doubles LOC and No One Cares!

Test Driven Development more than doubles the lines of code you have to write. With all that extra code to write, where will we ever find the time?! We have deadlines!

Lines of code has always been a bad metric; why bring them up now? Error-free robots, programming at a constant rate, might have to be concerned. But people are neither error-free or program at a constant rate. Does it matter that the LOC count doubles? The time consuming parts of programming are: thinking, problem solving, and confirming solutions.

In my article The Physics of TDD I compare two programming techniques: Debug Later Programming and Test Driven Development. I model how TDD shortens the time to mistake discovery to near zero. When you are immediately notified of a mistake, you can fix it immediately. If that same defect went undetected, as it does in DLP, it would lay dormant waiting to cause trouble.

DLP has less code (about half) and more bugs, while TDD has more code (about 2x) and less bugs. When I do TDD I find that I make subtle mistakes quite regularly. Many times an hour, but find them immediately. Each of those mistakes would have cost me, or someone, future find and fix time.

Let’s try a little thought experiment. I’ll write production code for a couple hours without TDD. Let’s go easy on me and say I only mess up 5 times per hour. After two hours of code writing, I manually test the new production code (which takes time), and almost fix all the problems. For the fun of it, say that one mistake every two hours goes undetected, and is later reported as a bug.

What does it cost to find and fix that buried mistake? Well that depends on the mistake. How long does it take you to find and fix a bug? Tough question to answer.

According to a defect study at Hewlett Packard (described in this paper), defect repair times have a distribution as described in this table:

Defect % Time to find and fix Time %
25% 2 hours/defect 8%
50% 5 hours/defect 40%
20% 10 hours/defect 32%
4% 20 hours/defect 12%
1% 50 hours/defect 8%

It’s just one study, but let’s roll with it. Being very conservative, let’s say the bugs left behind are the easy ones, and only take 2 hours to find and fix. If that 2 hours of effort could have been prevented by writing two hours of TDD style unit tests, I’m even! Actually, I am ahead of the game because I would not have spend that manual test time. If any of the defects were more difficult to find and fix (like the other 75% of the defects), I am way ahead of the game!

Doubling the lines of code does not slow me down.

TDD will slow you down while you are learning it. Also, I am not saying TDD will prevent all bugs. You will still have bugs, but fewer of them.

You experienced TDD’ers, what is your experience? Tell me about it and go take my poll on the lower right corner of my website.

Tags: , ,

6 Responses to “Extra! Extra! TDD Doubles LOC and No One Cares!”

  1. Something like this observation has been with us since at least as early as Royce’s 1970 paper. The one where he introduces the idea of a “waterfall” process only for the purpose of explaining why it does not work and not to do that (shame the Software Engineering textbook authors didn’t read past page 1).

    He says that if you have a process that puts testing at the end, then when—not if—a test fails (a good thing in itself: no-one ever learned anything from a passing test) an unknown amount of unplanned rework has been injected into the project. Unknown effort to diagnose the failure, unknown effort to fix the defect, and (by induction) unknown effort to test all over again.

    His recommendation is to start testing as early as possible and run testing in parallel with development (in parallel with design, in parallel with analysis, in parallel with …)

  2. Scott Duncan says:

    Yeah Keith! Sometimes I think we need to grind on what Royce actually said every chance we get. It may be the (at least, one of the) most misunderstood bits of software engineering lore. (Another one being using a manufacturing engineering & quality model for software, thereby, tarnishing the interest in engineering & quality ideas from a design perspective.)

    A nice paper on early appreciation for iterative and incremental approaches to development is the Basili and Larman paper, “Iterative and Incremental Development: A Brief History,” from IEEE Computer 36:6:47-56, June 2003. Larman’s book, Agile & Iterative Development: A Manager’s Guide, has a lot which amplifies that article, as well.

  3. I guess this means that you get a full set of regression tests & documentation of expected behaviour at no extra cost? As a Schedule Owner I would have thought that “something for nothing” is a very desirable property :-)

  4. Jeff Langr says:

    Hi James–

    Does TDD really double lines? Well, usually I find that I have maybe a bit more lines of test code than production code when doing TDD, so on the surface it looks like the lines are doubled.

    But every (Java or C++) system I’ve encountered that wasn’t built using TDD has been overly bloated. I’ve seen a couple systems where the production code shrunk to a third of the original size, and a couple more that shrunk to about half of the original size, after programmers added lots of tests and started refactoring. And just looking at the bulk of systems out there, I’d take the challenge that I could reduce their line count easily by 25% and more often by 50%.

    So it seems like almost a wash to me: system without TDD, twice the amount of code it needs. System with TDD, half the production code plus a comparable amount of test code. Well, of course, that compares someone who knows TDD with a bunch of programmers who didn’t give a hoot. I figure someone who was good with TDD could probably produce a much cleaner design, even without tests.

    I’m working on a smaller codebase now of about 30,000 lines. After about 40 hours investment, I’ve eliminated perhaps 3,000, and there’s no end in sight of potential cleanup. I’m having a blast.

    Jeff

  5. Mark T says:

    I agree that Lines of code (LOC) has always been a bad metric but many organizations still use it for bidding purposes. Maybe some day “story points” will be the way to bid but they are still yet another fuzzy widget. When using LOC the real metric should be WDLOC (working deliverable lines of code) no claims of “error free”. The cost metric is the time to produce a WDLOC. Which for embedded systems, I would say a TDD approach would come out cheaper overall. Pay me now or pay me later. The “unit” test code itself should never be counted as LOC that is just a cost of producing the working deliverable code along with debugging and SW/HW integration.

  6. Wisang Eom says:

    Quite an intersting topic. I found out that, in some organization, the average LOC developers write a day is around 10. The time for writing 10 Lines of code is trivial. Even though it is doubled, it is trivial too. Most of the time is spent on debugging and finding solutions for fixing the bug in a safe way.

    TDD doubles LOC. That’s true. But the test code does not get into target. So, no physical problem. Also the time for doubling the code is just trivial but the code quality is improved dramatically so that the totoal time can be saved. LOC is doubled but it is at least free and you may get considerable profit from it if your are skilled with TDD.

Leave a Reply