Test Imports module

From Gramps
Revision as of 16:36, 22 December 2016 by Prculley (talk | contribs)
Jump to: navigation, search

Import Tests

This module (test_imports.py) implements a method of testing the Gramps import modules. The method is loosely described as a functional test, (although it is integrated into unit-test) which compares the Gramps database after an import with the one created when the developer prepared the test file. It will only occasionally find bugs by itself, but is good for making sure that code changes did not break anything.

With suitable care, bug specific test files can be created that allow demonstration of a bug, and its fix. For example, create test file and expected results on code with proposed patch in place, then run on an unpatched version to demonstrate the bug.

The files to test and expected results are stored in the 'data/tests' directory. Test files may be any type of files that Gramps can import except '.gramps' files. For example, 'imp_sample.ged', 'imp_vcard.vcf', etc. The program runs through all the files in this directory that start with 'imp_', so there is no limit on the number of tests performed.

Note: the '.gramps' file importer gets well exercised by this test as it is used for expected results.

The expected result files are '.gramps' XML files, as exported during the test development phase. They must have the same base name as the test files. For example, the 'imp_sample.ged' must have a 'imp_sample.gramps' in the same directory.

Differences between the 'expected result' and the actual import data are filtered and reported during unit-test execution. The differences are also stored as '.difs' text files, one for each import in the Gramps TEMP_DIR ('.gramps/temp' on linux, and typically 'C:\Users\name\AppData\Roaming\gramps\temp" on Windows). The files are named with the same base file name as the test file (ex: 'imp_sample.ged' --> 'imp_sample.difs').

If the developer determines that a test fails, presumably due to a bug that has not been patched yet, but wants to allow an exception for that file, he can copy the '.difs' file created at the failure time to the test directory. The module code will compare that exception file with failures found in future runs and, if the exception '.difs' file and the new error are identical, will declare the test a pass. If the results are in fact different, the test fails again. This allows for continued testing, even in the presence of bugs.

The module is integrated into unit-test, so will execute with other tests in that environment. With your working directory set to the Gramps source directory, the module can also be executed by:

   'python3 -m unittest gramps/plugins/test/test_imports.py'
   'python3 gramps/plugins/test/test_imports.py'

If you want to only run a single import test, rather than all of them, then use:

   'python3 gramps/plugins/test/test_imports.py' -i 'sample.ged'

The file to test must still be located in the 'data/tests' directory.

Sample output:

  1. Mismatch on file: imp_FTM_16dec2015a-mod.ged
  2. Media: M159 Handle: x0000000300000003
  3. Diff on: Media, mime
  4. <class 'str'>10: image/jpeg
  5. <class 'str'>7: unknown
  6. Missing Note: GEDCOM import [N0001]
  7. Added Media: D:\Users\name\Downloads\1850 United States Federal Census(11)-1.jpg [M158]

  1. file name
  2. Object with Difference problem
  3. The object data path with the differences
  4. For Differences, the result from the test import file
  5. For Differences, the result from the expected result '.gramps' file
  6. Missing, this was not found in the expected result '.gramps' file
  7. Added; this was not found in the test file, so it seems to be added to the '.gramps' file

Debugging with the output:

It is perhaps unfortunate that this test method is quite sensitive to changes in the number of items in a category. For example, if one version of an importer tosses up an error which is added as a Note to the database, then a different version of the importer which does not produce this note will probably generate many mismatches. So always look at the Missing and Added sections of the errors list first. They may indicate why a new item results in lots of 'Diff' items as well; the underlying handles and GrampsIDs have been renumbered and consequently show all sorts of data differences. Any other changes that result in a new or removed handle or Gramps_ID have the same result.

The items in the output lists are sorted by handle order. Since the handles are based on IDs, which are, under 'Deterministic mode', sequential with time, differences early in the list are likely to easiest to debug. The handles are also shown, so the user can see what order the importer assigned elements.

If the developer uses some care in naming the items to import, he can often use the names to zero in on the issues that this test reports. For instance, instead of naming every place in a test file 'the place', using 'place 1', 'place 2' etc. can help in finding bugs. Of course if you are debugging issues related to merging identical places then this doesn't work. Even then the related events, or notes can contain some debugging data.

Preparing 'expected' results:

Making the '.gramps' files used as 'expected' results is basically an import of the original file and then an export as '.gramps' XML file. I strongly suggest reviewing the expected results file in detail as compared to the import file to make sure that the import process worked correctly. A good reason to keep test files small and specific to the bug.

In order to prepare the expected results, the IDs generated by Gramps for handles and other purposes must be made consistent with the ones generated during test execution. Putting the ID generator into the 'deterministic' mode can be done one of two ways, depending on you preference for GUI or CLI methods.

GUI

There is a small add-on called 'Deterministic ID' which changes and resets the ID generation algorithm from a combination of date/random to a simple counter, starting at zero. This makes handles easier to read, and you can infer the order of handle creation from the handle number during debugging. As I have not yet published this add-on for general consumption, (I'm not sure it ever needs publishing as used only for testing), it must be installed by copying from the Gramps-project/addons-source at github or from a local cloned copy. https://github.com/gramps-project/addons-source/tree/master/DetId The two files 'DetId.py' and 'DetId.gpr.py' msut be loaded into the appropriate directory:

'~.gramps/gramps42/plugins/DetId'      # for Linux gramps42 versions
'~.gramps/gramps50/plugins/DetId'      # for Linux gramps50 versions
'C:\Users\name\AppData\Roaming\gramps\gramps50\plugins\DetId'  # for Windows gramps50 versions

When this is done, a new menu item in Gramps 'Tools->Utilities->Deterministic ID' appears after Gramps restart.

With a new family tree, use the 'Tools->Utilities->Deterministic ID' to get started. Import the file of interest (ex: 'imp_sample.ged') and immediately without modifying the data, export to Gramps XML. To make the Gramps XML easier to read, turn off the 'Use Compression' checkbox in the 'Export options' dialog. Made sure to use the same base file name as your import file (ex: 'imp_sample.gramps').

It is possible to process several test files without restarting Gramps, as long as the 'Tools->Utilities->Deterministic ID' is run before each import.

CLI

If you prefer to use the CLI, you will need a small module that puts Gramps into 'deterministic' mode and then starts gramps the usual way. This method described assumes you have a gramps source tree set up so you can run directly from source. Prepare a python file containing the following code (I called mine 'GrampsDetmode.py' and put it in my '~/Documents' folder):

import gramps.grampsapp as app
from gramps.gen.utils.id import set_det_id
set_det_id(True)
app.main() 

Then run Gramps in deterministic mode to do the import/export from the root of the source tree:

python3 "/home/name/Documents/GrampsDetmode.py" -i "/home/name/Documents/sample.ged" -e "/home/name/Documents/sample.gramps"

Assumptions behind the test:

  1. Gramps XML contains ALL useful data in a family tree.
  2. Gramps XML is sufficiently stable in layout so as to not require constant careful rebuilding of 'expected' files.
  3. using import files for test cases is easier than writing unit tests for the many methods in the importers. May not be as thorough, but it would take a long daunting time to do a reasonable job on a method by method basis.
  4. Most import bugs end up with sample import files in the issue tracker to aid in debugging, these can be easily modified or used as is as the test files.

Notes:

Make sure that the imported file layout is identical to the one used when creating the 'expected result' file. That is, if the file contains references to media, and those references resolved when the 'expected result' was created, then they must also resolve during testing. The 'base path for relative media paths' should be the same for both environments (not clear how to do this for unit-test mode).

If your test wants to include the 'preferences.default-source' and 'preferences.tag-on-import' turned on, adding '_dfs' at the end of the test filename will do so. Example 'imp_sample_dfs.ged'

The gramps XML export/import code acts as a filter for some operations, leading to minor differences between databases. For example:

  1. exported strings have had leading and trailing spaces removed.
  2. media file paths are converted these to UNIX convention. For example;

"d:\users\name\photos\myphoto.jpg" --> "d:/users/name/photos/myphoto.jpg".

  1. the 'change' dates on most objects are updated by the import process.

The import_tests module allows for these changes and accepts them.

Test module files:

'gramps/plugins/test/test_imports.py'  # new file
'gramps/gen/utils/id.py'               # changed to support deterministic ID

Add-on files in users plugins directory:

'DetId/DetId.py'
'DetIdDetId.gpr.py'

Contributors:

Paul Culley <[email protected]>
Doug Blank  <[email protected]>
Many thanks to Doug Blank, his Differences Addon contained many ideas and much code used for this module as well as the gramps code (diff.py) on which this module depends.

See also