Document Liberation Project regression testing

Tags

import filters, ODF, ODF generator, regression testing

Why this post?

I am writing this as a direct response to Miklos’s blog post on the same theme. Miklos argues against the current setup for regression testing that all our import libraries use. I do not believe his approach would be substantially better than the current one. I will try to summarize my thoughts about it in the following text. I, however, admit that the current setup is not quite perfect and I can envision some improvements…

How the current regression test suites work

For every import library, there is a separate repository that contains the regression test suite. That consists from sample documents and pre-generated output files in several formats, which are generated by command line conversion tools that every library provides. Most important of these is the so-called “raw” format: it is simply a serialization of the librevenge API calls. Additional output formats include ODF and SVG for graphic libraries.

The test suite is driven by two perl scripts: regression.pl checks that the current output matches the saved output and writes a diff file for any difference; regenerate_raw.pl updates the current output files. These scripts are copied from test suite to test suite and adapted for the current use (e.g., which formats are checked, location of the test directories, etc.)

Better way? Or maybe not…

This section discusses pros and cons of Miklos’s approach in the context of DLP import libraries. It uses citations from Miklos’s blog post.

Better focused checks

Being automatically generated, you have no control over what part of the output is important and what part is not — both parts are recorded and when some part changes, you have to carefully evaluate on a case by case basis if the change is OK or not.

This is not as big deal as it would seem, especially if the changes are checked regularly and the test repository is kept updated. Usually the changes are quite localized and easy to verify.

Single-point failure

… from time to time you just end up regenerating your reference testsuite and till the maintainer doesn’t do that, everyone can only ignore the test results — so it doesn’t really scale.

The test suite is in gerrit, next to the main repository. If someone submits a fix for review, he can submit an update to the test suite too. I admit that the two changes would not be linked in any way, but we do not get that many contributions for that to be a problem. And it would be possible to make the test suite a submodule of the main repository, which would fix this.

No way to forget to run the tests

Provided that make distcheck is ran before committing, you can’t forget to clone and run the tests.

As a de-facto release engineer for the majority of DLP‘s libraries, I have got a check list of things to do before a new release. Running the regression tests is just one item on that list.

Less prone to unrelated changes

Writing explicit assertions means that it’s rarely needed to adjust existing tests.

On the other side, it is an extra work to write them. And, more importantly, to keep them in sync with the code, so they cover everything that is necessary. With the current approach, any change in the output is immediately visible.

Possible to commit code change + test in a single commit

Having testcase + code change in the same commit is one step closer to the dream …

Not my dream, though. I prefer to push test cases as separate commits anyway…

To be fair, the current approach would makes it rather difficult to run the test suite for an older checkout, because there is no association to a particular commit in the test repository. But I do not think I have ever needed this, so I do not see it as a problem.

Big increase in size of the main repository

LibreOffice’s code is huge–20 MB of test files would be about 1% of the size. This is not true for the libraries we are talking about. Their size if several MB at most, so addition of a number of data files immediately shows up in the repository size. It also shows up in release tarball’s size, which is even more important point.

Let me show an anecdotal example: the current size of unpacked tarball of libetonyek is 3 MB. The cumulative size of the test documents in its test repository is 24 MB. And these documents only cover Keynote 5 format…

Testing of multiple versions of a format induces copy-paste

We typically have tests for multiple versions of the same file format. These also often have approximately the same content over all versions. I assume that, when adding a new test file that is based on similar file produced by a different version of the application, the test case would most probably be copied from test case for that other file. That means that if a change is needed later (e.g., to add a new check), it has to be duplicated over several places. This increases the risk that some of the test cases will not be updated.

Possible improvements

Diff is not always good enough

If there is a change in the output, regression.pl generates a diff. This, however, is not always the best way to show the changes. In some cases, word diff (e.g., generated by dwdiff) would be much better.

Dependency on other libraries

All the output generators are implemented in external libraries. This is not a problem for the “raw” output, as this is not expected to change. But ODF output is often affected by changes in libodfgen. Unfortunately, this also means that the tests only work with a specific version of libodfgen–typically the current master. This is a problem and I think that our decision to test ODF conversion in the libraries’ test suites was wrong and counter-productive. IMHO the output generators should be tested in the libraries that implement them.

This is already partly done for libodfgen, as we have test code that generates various ODF documents programmaticaly. But the output is just saved to files that must be examined manually–there is no automated check of the output. IMHO Miklos’s approach would be really beneficial here.

Conclusion

While the current regression testing setup is not perfect, there is no need to radically change it, as the proposed alternative does not really add many benefits. The biggest concern is a considerable increase in size of the release tarballs. However, we should limit the tests to the use of the raw format and move tests of output generators to the libraries that implement them. It makes sense to use Miklos’s approach to test these.

8 thoughts on “Document Liberation Project regression testing”

Miklos Vajna said:

March 15, 2015 at 5:10 pm

I was aware that you’re mostly happy with the “current” testing setup, but I was not aware of the details. Thanks for sharing them. 🙂

davidgerard said:

March 16, 2015 at 10:15 pm

Is it actually an unexpected result to have the tests be much larger than the code?

A well-known recent example is how sqlite has 90k lines of code, but 90M lines of tests.

Of course, sqlite doesn’t make most of its tests available …

But if you have a complicated, quirk-riddled proprietary document format, it seems quite plausible for it to have a test file for each quirk, adding up to quite a large test library.

- David Tardon said:
  
  March 17, 2015 at 8:19 am
  
  It seems to me that you missed the point… It is not unexpected. But there is no need for the test files to live in the same repository as the code and blow up the size of release tarballs.
  
  - davidgerard said:
    
    March 17, 2015 at 9:12 am
    
    Oh, totally, I completely agree, as long as that’s feasible 🙂
Karellen said:

March 17, 2015 at 12:33 pm

Why does it follow that adding the tests to the source repository will necessarily add them to the release tarball? Wouldn’t it be possible to include the tests in the repo, but just exclude them from the tarballs?

- David Tardon said:
  
  March 17, 2015 at 5:07 pm
  
  It does not. But what would we gain by that? Let’s suppose we would do it. If these tests used the same setup as currently, they would be completely independent from the rest anyway. And if they used Miklos’s code-based setup, they would have to be optional and disabled by default. Also, the configure option to enable them would still be present in release tarballs, but would do nothing (or cause a build failure, if not implemented properly). IMHO that would only cause confusion. Last but not least, they would blow up the size of the repository.
  
  - Karellen said:
    
    March 17, 2015 at 7:01 pm
    
    What would you gain? Uh, you gain all the benefits Miklos and you already mentioned, like keeping the testsuite in sync with the code it tests for all developers, and having code + related testsuite changes together in a single commit, and being able to reliably checkout a historic code + testsuite pair to be able to re-run (something I do happen to have found useful on some projects I’ve worked with).
    
    And you gain all that without your “biggest concern” of release tarball size getting in the way.
    
    Getting the independent/configure script setup just right might be tricky, but it’s not problem of incomprehensible complexity, nor will it require 1000s of lines of code to fix. And while an 8x repo size increase isn’t nice, disk space and bandwidth are pretty cheap, and only getting cheaper.
  - David Tardon said:
    
    March 19, 2015 at 9:11 am
    
    > What would you gain? Uh, you gain all the benefits Miklos and you already mentioned, like keeping the testsuite in sync with the code it tests for all developers,
    
    Sorry, but this does not follow automatically. It still requires _all_ contributors to know about the testsuite and to use it (so they can react if it breaks after a change). Which is no difference compared to the current state.
    
    > and having code + related testsuite changes together in a single commit,
    
    Not everyone does see this as an advantage… Moreover, this would complicate downstream patching, because fixes would also touch files not present in the release tarball.
    
    > and being able to reliably checkout a historic code + testsuite pair to be able to re-run (something I do happen to have found useful on some projects I’ve worked with).
    
    As I already wrote, I know of noone who has ever needed this… Anyway, to support this scenario, all that is needed is to make the test repository a submodule of the main repository.
    
    > And while an 8x repo size increase isn’t nice, disk space and bandwidth are pretty cheap, and only getting cheaper.
    
    Which is no excuse to waste either. There are still occasions when onlyi limited bandwidth is available. I can assure you that if one wants to checkout a project while attending a conference, 5 or 50 MB makes a big difference.

Dave Tardon

~ Babbling of a code monkey

Document Liberation Project regression testing

Why this post?

How the current regression test suites work

Better way? Or maybe not…

Better focused checks

Single-point failure

No way to forget to run the tests

Less prone to unrelated changes

Possible to commit code change + test in a single commit

Big increase in size of the main repository

Testing of multiple versions of a format induces copy-paste

Possible improvements

Diff is not always good enough

Dependency on other libraries

Conclusion

8 thoughts on “Document Liberation Project regression testing”

Leave a comment Cancel reply

Why this post?

How the current regression test suites work

Better way? Or maybe not…

Better focused checks

Single-point failure

No way to forget to run the tests

Less prone to unrelated changes

Possible to commit code change + test in a single commit

Big increase in size of the main repository

Testing of multiple versions of a format induces copy-paste

Possible improvements

Diff is not always good enough

Dependency on other libraries

Conclusion

Share this:

Related

8 thoughts on “Document Liberation Project regression testing”

Leave a comment Cancel reply