Software Testing Categorization

July 14th, 2009 · 14 Comments ·

You hear people talking about small/medium/large/unit/integration/functional/scenario tests but do most of us really know what is meant by that? Here is how I think about tests.

Unit/Small

Lets start with unit test. The best definition I can find is that it is a test which runs super-fast (under 1 ms) and when it fails you don’t need debugger to figure out what is wrong. Now this has some implications. Under 1 ms means that your test cannot do any I/O. The reason this is important is that you want to run ALL (thousands) of your unit-tests every time you modify anything, preferably on each save. My patience is two seconds max. In two seconds I want to make sure that all of my unit tests ran and nothing broke. This is a great world to be in, since if tests go red you just hit Ctrl-Z few times to undo what you have done and try again. The immediate feedback is addictive. Not needing a debugger implies that the test is localized (hence the word unit, as in single class).

The purpose of the unit-test is to check the conditional logic in your code, your ‘ifs’ and ‘loops’. This is where the majority of your bugs come from (see theory of bugs). Which is why if you do no other testing, unit tests are the best bang for your buck! Unit tests, also make sure that you have testable code. If you have unit-testable code than all other testing levels will be testable as well.

A KeyedMultiStackTest.java is what I would consider great unit test example from Testability Explorer. Notice how each test tells a story. It is not testMethodA, testMethodB, etc, rather each test is a scenario. Notice how at the beginning the test are normal operations you would expect but as you get to the bottom of the file the test become little stranger. It is because those are weird corner cases which I have discovered later. Now the funny thing about KeyedMultiStack.java is that I had to rewrite this class three times. Since I could not get it to work under all of the test cases. One of the test was always failing, until I realized that my algorithm was fundamentally flawed. By this time I had most of the project working and this is a key class for byte-code analysis process. How would you feel about ripping out something so fundamental out of your system and rewriting it from scratch? It took me two days to rewrite it until all of my test passed again. After the rewrite the overall application still worked. This is where you have an AHa! moment, when you realize just how amazing unit-tests are.

Does each class need a unit test? A qualified no. Many classes get tested indirectly when testing something else. Usually simple value objects do not have tests of their own. But don’t confuse not having tests and not having test coverage. All classes/methods should have test coverage. If you TDD, than this is automatic.

Medium/Functional

So you proved that each class works individually, but how do you know that they work together? For this we need to wire related classes together just as they would be in production and exercise some basic execution paths through it. The question here we are trying to answer is not if the ‘ifs’ and ‘loops’ work, (we have already answered that,) but whether the interfaces between classes abide by their contracts. Great example of functional test is MetricComputerTest.java. Notice how the input of each test is an inner class in the test file and the output is ClassCost.java. To get the output several classes have to collaborate together to: parse byte-codes, analyze code paths, and compute costs until the final cost numbers are asserted.

Many of the classes are tested twice. Once directly throughout unit-test as described above, and once indirectly through the functional-tests. If you would remove the unit tests I would still have high confidence that the functional tests would catch most changes which would break things, but I would have no idea where to go to look for a fix, since the mistake can be in any class involved in the execution path. The no debugger needed rule is broken here. When a functional test fails, (and there are no unit tests failing) I am forced to take out my debugger. When I find the problem, I add a unit test retroactively to my unit test to 1) prove to myself that I understand the bug and 2) prevent this bug from happening again. The retroactive unit test is the reason why the unit tests at the end of KeyedMultiStackTest.java file are “strange” for a lack of a better world. They are things which I did not think of when i wrote the unit-test, but discovered when I wrote functional tests, and through lot of hours behind debugger track down to KeyedMultiStack.java class as the culprit.

Now computing metrics is just a small part of what testability explorer does, (it also does reports, and suggestions) but those are not tested in this functional test (there are other functional tests for that). You can think of functional-tests as a set of related classes which form a cohesive functional unit for the overall application. Here are some of the functional areas in testability explorer: java byte-code parsing, java source parsing, c++ parsing, cost analysis, 3 different kinds of reports, and suggestion engine. All of these have unique set of related classes which work together and need to be tested together, but for the most part are independent.

Large/End-to-End/Scenario

We have proved that: ‘ifs’ and ‘loops’ work; and that the contracts are compatible, what else can we test? There is still one class of mistake we can make. You can wire the whole thing wrong. For example, passing in null instead of report, not configuring the location of the jar file for parsing, and so on. These are not logical bugs, but wiring bugs. Luckily, wiring bugs have this nice property that they fail consistently and usually spectacularly with an exception. Here is an example of end-to-end test: TestabilityRunnerTest.java. Notice how these tests exercises the whole application, and do not assert much. What is there to assert? We have already proven that everything works, we just want to make sure that it is wired properly.

Tags: Uncategorized

14 responses so far ↓

  • misko // Jul 14, 2009 at 1:49 pm

    If you would like to work for company which takes testing seriously, or if you like what you read, why not send me your resume. We are always looking for sharp, energetic people. misko@google.com

  • David Burns // Jul 15, 2009 at 12:35 am

    Very good break down of the different tests. I have been trying to find an explanation of the Google small/medium/large testing for a while to show my colleagues after hearing about them at GTAC 2008.

  • Peter J. // Jul 15, 2009 at 5:49 am

    How do you configure your IDE to run only your unit tests for Testability Explorer, since they’re intermingled with your functional and scenario tests? There doesn’t appear to be any naming or path distinction, and your previous post doesn’t indicate that you’re setting up your Eclipse environment to include or exclude files (which I’d think would be a pain to track manually and difficult to transfer to another developer).

  • misko // Jul 15, 2009 at 8:05 am

    @Peter,

    I have a very simple solution for that! I make sure that all my tests are fast, and I just run them all, all the time. When your large test become slow is when you start not running them on every change.

  • Peter J. // Jul 15, 2009 at 9:34 am

    Oh, your two-second rule applies to the entire test suite and not just the unit tests? Wow… we run up against a local 30-second limit all the time, and that’s without any “large” tests.

  • Sarthak // Jul 15, 2009 at 9:48 am

    I think there should still be a way for test categorization and running only tests from a particular category.

    For functional tests, there might be huge systems and it may take around 5-7 seconds per unit test (we have a system which makes external service calls which calls another service, etc.). For unit tests we mock this behavior, but for functional tests, we still rely on the external service.

    So if we have such test cases, I feel we should still be able to only run tests of a particular category. TestNG provides such a feature (never used it, but have read about it), whereas the junit framework doesnt have such features. However, a custom JunitRunner can solve this problem.

    The best practice would still be around 2 secs max. That would be cool …

  • misko // Jul 15, 2009 at 9:59 am

    @Peter,

    On a small project you can get away with everything running in 2 secs. On large project you can not. You need to break it to S/M/L and than run L/M separately, as they tend to be slower. I always strive to make things fast and I run as many tests as I can fit in the two second rule.

  • misko // Jul 15, 2009 at 10:01 am

    @ Sarthak,
    your Medium test should have external services mocked out, otherwise you are testing too much and it becomes slow and flaky and it is a large test. I almost feel like there should be a rule that you should not have more than 20 large tests. :-)

  • Josh // Jul 16, 2009 at 9:00 am

    How much patience you have is subjective, if you modify variables you are performing I/O so I guess he really meant disk I/O.

    Why go through all the theatrics of try to categorize tests into “unit”, “functional”,”end-to-end” when it obviously depends on the app and the developers opinion of “fast” vs “slow”. What if you have 4 tiers of test progressively getting slower, what’s the 4th word that needs to be made up for the new tier?

    how about just “automated tests” then everyone can just categorize them into tiers based on their specific development lifecycle.

  • misko // Jul 16, 2009 at 9:03 am

    @Josh,

    These are not theatrics. These tests are fundamentally different. I write them in different way and I write them for different reasons and they find different kinds of bugs.

    – Misko

  • Sarthak // Jul 16, 2009 at 1:45 pm

    @Misko,

    Shouldnt the rule depend on the size of the application. I agree though that there cant be so many functional tests.

    Here is probably one criterion to separate that :

    –> Only test the wiring using functional test. Only if the wiring is done in a different way for some use case, then write a separate functional test for that. Otherwise, it should be safe to assume that the wiring works. Basically, cover all the wiring cases using different functional tests.

    Does that sound like a feasible solution?

  • Scott David Daniels // Jul 21, 2009 at 9:16 am

    When I was developing automated tests for the microcode on a large departmental printer (almost all functional, some verging on unit), it was a rare test that took under 5 seconds. Things had to come up to temperature and speed, and lots of mechanical action happened on every test. The longest (non-endurance) tests took more like twenty minutes.

  • Arnold Strasser // Aug 14, 2009 at 3:05 am

    You mentioned that testability-explorer provides c++ parsing … I can’t find any information on this topic on the official testability-homepage … are you sure that this is correct?

  • misko // Aug 14, 2009 at 10:26 am

    @Arnold,

    we said we were working on it, but it never got finished. :-( The person who was working on it left the project.

    – Misko

Leave a Comment