Tags

, , ,

Why this post?

I am writing this as a direct response to Miklos’s blog post on the same theme. Miklos argues against the current setup for regression testing that all our import libraries use. I do not believe his approach would be substantially better than the current one. I will try to summarize my thoughts about it in the following text. I, however, admit that the current setup is not quite perfect and I can envision some improvements…

How the current regression test suites work

For every import library, there is a separate repository that contains the regression test suite. That consists from sample documents and pre-generated output files in several formats, which are generated by command line conversion tools that every library provides. Most important of these is the so-called “raw” format: it is simply a serialization of the librevenge API calls. Additional output formats include ODF and SVG for graphic libraries.

The test suite is driven by two perl scripts: regression.pl checks that the current output matches the saved output and writes a diff file for any difference; regenerate_raw.pl updates the current output files. These scripts are copied from test suite to test suite and adapted for the current use (e.g., which formats are checked, location of the test directories, etc.)

Better way? Or maybe not…

This section discusses pros and cons of Miklos’s approach in the context of DLP import libraries. It uses citations from Miklos’s blog post.

Better focused checks

Being automatically generated, you have no control over what part of the output is important and what part is not — both parts are recorded and when some part changes, you have to carefully evaluate on a case by case basis if the change is OK or not.

This is not as big deal as it would seem, especially if the changes are checked regularly and the test repository is kept updated. Usually the changes are quite localized and easy to verify.

Single-point failure

… from time to time you just end up regenerating your reference testsuite and till the maintainer doesn’t do that, everyone can only ignore the test results — so it doesn’t really scale.

The test suite is in gerrit, next to the main repository. If someone submits a fix for review, he can submit an update to the test suite too. I admit that the two changes would not be linked in any way, but we do not get that many contributions for that to be a problem. And it would be possible to make the test suite a submodule of the main repository, which would fix this.

No way to forget to run the tests

Provided that make distcheck is ran before committing, you can’t forget to clone and run the tests.

As a de-facto release engineer for the majority of DLP‘s libraries, I have got a check list of things to do before a new release. Running the regression tests is just one item on that list.

Less prone to unrelated changes

Writing explicit assertions means that it’s rarely needed to adjust existing tests.

On the other side, it is an extra work to write them. And, more importantly, to keep them in sync with the code, so they cover everything that is necessary. With the current approach, any change in the output is immediately visible.

Possible to commit code change + test in a single commit

Having testcase + code change in the same commit is one step closer to the dream …

Not my dream, though. I prefer to push test cases as separate commits anyway…

To be fair, the current approach would makes it rather difficult to run the test suite for an older checkout, because there is no association to a particular commit in the test repository. But I do not think I have ever needed this, so I do not see it as a problem.

Big increase in size of the main repository

LibreOffice’s code is huge–20 MB of test files would be about 1% of the size. This is not true for the libraries we are talking about. Their size if several MB at most, so addition of a number of data files immediately shows up in the repository size. It also shows up in release tarball’s size, which is even more important point.

Let me show an anecdotal example: the current size of unpacked tarball of libetonyek is 3 MB. The cumulative size of the test documents in its test repository is 24 MB. And these documents only cover Keynote 5 format…

Testing of multiple versions of a format induces copy-paste

We typically have tests for multiple versions of the same file format. These also often have approximately the same content over all versions. I assume that, when adding a new test file that is based on similar file produced by a different version of the application, the test case would most probably be copied from test case for that other file. That means that if a change is needed later (e.g., to add a new check), it has to be duplicated over several places. This increases the risk that some of the test cases will not be updated.

Possible improvements

Diff is not always good enough

If there is a change in the output, regression.pl generates a diff. This, however, is not always the best way to show the changes. In some cases, word diff (e.g., generated by dwdiff) would be much better.

Dependency on other libraries

All the output generators are implemented in external libraries. This is not a problem for the “raw” output, as this is not expected to change. But ODF output is often affected by changes in libodfgen. Unfortunately, this also means that the tests only work with a specific version of libodfgen–typically the current master. This is a problem and I think that our decision to test ODF conversion in the libraries’ test suites was wrong and counter-productive. IMHO the output generators should be tested in the libraries that implement them.

This is already partly done for libodfgen, as we have test code that generates various ODF documents programmaticaly. But the output is just saved to files that must be examined manually–there is no automated check of the output. IMHO Miklos’s approach would be really beneficial here.

Conclusion

While the current regression testing setup is not perfect, there is no need to radically change it, as the proposed alternative does not really add many benefits. The biggest concern is a considerable increase in size of the release tarballs. However, we should limit the tests to the use of the raw format and move tests of output generators to the libraries that implement them. It makes sense to use Miklos’s approach to test these.