Coder in a World of Code: Achieving Low Defect Rates

Overview

Software defects, from null dereferences to array out of bounds and concurrency errors are a serious issue, especially in security-critical software. As such minimising defects is often a stated goal of many projects.

I am currently writing a library (OTP-Java) which provides several one-time password systems to Java applications, and this is a brief run-down of some of the steps I am taking to try to ensure that the library is as defect-free as possible. The library is not even alpha as yet. It still has several failing tests, but hopefully will be "completed" relatively soon.

Specification

Many products go forward without a specification. In many cases this is not a "bad thing" per-se, but it can make testing more difficult.

When surprising or unexpected behaviour is found, it should be classified as either a defect or simply a user without a complete understanding of the specification. With no specification, there can be so such classification. The best that can be done is to assess the behaviour and to determine if it is "wanted".

As an example, I have seen a system where one part expected a user to have access to some data, and forwarded them on to it. The system for retrieving the data had more stringent requirements, and threw a security exception. Without a clear, unambiguous specification, there's no way of telling which part of the system is in error, and hence, no immediate way to tell which part of the system should be corrected.

I would like to make it clear that I am not advocating for every system to have a large, unambiguous specification. If the product is security or safety critical, I would argue that it is an absolute must, and many share this view. For most other systems, a specification is an additional burden that prevent a product from getting to market. If a failure of your system will sink your business or kill people, then a specification is wise. Otherwise, just listen to your users.

Defects

Given a specification, a defect is often defined simply as a deviation from that specification. Many defects are benign, and will not cause any issues in production. However, some subset of defects will lead to failures -- these are actual problems encountered by users: exception messages, lost data, incorrect data, data integrity violations and so on.

It is often seen to be most cost-effective to find and eliminate defects before deployment. Sometimes, especially in systems that do not have an unambiguous specification, this is extremely difficult to do, and in a sufficiently large system, this is often nearly impossible.

For a large enough system, it's likely that the system will interact with itself, causing emergent behaviour in the end product. These odd interactions are what make the product versatile, but also what make eliminating surprising behaviour nearly impossible, and it may even be undesired for certain products.

Static Analysis

Tools that can provide feedback on the system without running it are often invaluable. Safety critical systems are expected to go through a battery of these tools, and to have no warnings or errors.

I am using FindBugs on my OTP-Java project to try to eliminate any performance or security issues. I have found that it provides valuable feedback on my code, pointing out some potential issues.

There are also tools which will rate the cyclomatic complexity (CC) of any methods that I write. I believe that Cobertura will do this for me. This will be important, as a high CC is correlated with a high defect rate, and is also expected to make reading the code more difficult.

Testing

Randomised Testing

Fortunately, generating and verifying one-time passwords (OTPs) is a problem space where there are clear measures of success, for example, if I generate a valid OTP, I must be able to validate it. Similarly, if I generate an OTP, modify it, the result should not be valid.

This lends itself to randomised testing, where random "secrets" can be produced, and used to generate OTPs. These can then be verified or modified at will.

Other properties can also be validated, such as, requesting a 6-digit OATH OTP actually does produce a 6-digit string, and that the output is entirely composed of digits.

For the OTP-Java project, I am using the Java implementation of QuickCheck, driven by JUnit.

Unit Testing

In OTP-Java, I've augmented the randomised testing with some test vectors extracted from the relevant RFCs. These test vectors, along with the randomised tests, should provide further confidence that the code meets the specification.

Usually, unit testing only involves testing a single component or class. However, with such a small library, and with such well-defined behaviour, it makes sense to test the behaviour of several parts of the system at the same time.

In my opinion, these tests are probably too big to be called unit tests, and too small to be called integration tests, so I've just lumped them together under "unit tests". Please don't quarrel over the naming. If there's an actual name for this style of testing, I'd like to know it.

Assertive Programming

I have tried to ensure that, as far as possible, the code's invariants are annotated using assertions.

That way, when an invariant is violated under testing, the programmer (me!) is notified as soon as possible. This should help with debugging, and will hopefully avoid any doubt when a test fails as to whether it was a fluke (hardware, JVM failure, or other) or genuine failure on my part.

This has, so far, been of a lot of use in randomised testing, where there have been test failures, but determining exactly why has been shown by the assertions.

It also helps users of the library. If my input validation is not good enough, and the users subject the library to tests as part of their testing, they will also, hopefully, be helped by the assertions, as they may help explain the intent of the code.

Type System

While Java's type system does leave a lot to be desired, it is quite effective at communicating exactly what is required and may be returned from specific methods.

I have, unlike the reference implementation in the RFC (Mmm, "String key", "String crypto", and so on), tried to use appropriate types in my code, requiring a SecureRandom instead of just byte[] or even Random, to convey the fact that this is a security-critical piece of code, and one shouldn't use "any old value", as has often happened with OpenSSL's API (See also, predictable IVs) which have lead to real vulnerabilities in real products.

Shielding the user from common mistakes by the use of a "sane" or "obvious" API is as much my job as it is the final user's. The security of any product which relies on the library is formed by the library's correct specification and implementation, as well as it's correct use. Encouraging and supporting both is very important.

Code Coverage

Code coverage is often a yard-stick for how effective your testing is. Code coverage of 100% is rarely possibly. For instance, if you use Mac.getInstance("HmacSHA1"), it's nearly impossible to trigger the "NoSuchAlgorithmException".

However, many tools provide branch-coverage as well as line coverage. Achieving a high coverage can help your confidence, but when using complex decision cases (for example, if (a && b || !(c && (d || e)))), it's very hard to really be sure that you've covered all cases for "entry" into a block

Cyclomatic complexity (CC) should help here. As a rough guide, if you have a CC of 2, you should have at least 2 tests. Although, this is still just a rule of thumb, it does help me feel more confident that I've ensured that, to a reasonable level, all eventualities are accounted for.

Conclusion

Many products don't have a specification, which can make reducing surprising behaviours difficult. Similarly, not all defects lead to failures.

However, even without a specification, some of the techniques listed in this post can be applied to try to lower defect rates. I've personally found that these increase my confidence when developing, but maybe that just increases my appetite for risk.

I am by no means saying that all of the above tools and techniques must be used. Similarly, I will also not say that the above techniques will ensure that your code is defect free. All you can do is try to ensure that your defect rate is lowered. For me, I feel that the techniques and tools in this post help me to achieve that goal.

Coder in a World of Code

Sunday, 14 September 2014

Achieving Low Defect Rates