Database Formats

From Gramps
Revision as of 23:05, 13 June 2007 by Don (talk | contribs) (New page: Category:Developers/Reference ==History== GRAMPS default data format has been evolving over time. Each major change in format usually results in an increase in the major version numb...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


History

GRAMPS default data format has been evolving over time. Each major change in format usually results in an increase in the major version number.

GRAMPS 1.0

GRAMPS 1.0 used XML as its default format. This format is portable and easily read by both computers and people. It has two major issues that caused problems with used as a default format.

  • Slow to load and save. The entire file had to be parsed to load the data, and parsing XML is not fast. Similarly, to save any changes, the entire file had to be written. People with larger databases found the load and save times to be unusable.
  • Consumes a lot of memory. The XML format required that all data be loaded and stored in memory. Larger databases could consume all memory on the system, bring the system to a virtual halt.

GRAMPS 2.0

To solve the capacity issues, GRAMPS 2.0 switched to using the Berkeley database, using the ".grdb" extension to identify the file. All database information was stored in this file. This resolved both the load/save time issues and the memory consumption issues. Using a real database backend allowed us to only load the data into memory when we needed it.

The grdb format was a significant step forward for GRAMPS. However, it was susceptible to data corruption. Since data commits were not atomic (all related changes occurred at once), data could get corrupted if an error occurred while a change was being made.

GRAMPS 2.2

GRAMPS 2.2 started using the transaction capability of the Berkeley database. This feature ensures that all related data is committed at once. So, if an error occurs while that data is being saved, the database remains intact. Either the entire set of changes makes it to the database, or none of the changes make it.

The problem with the approach in 2.2 is that a single file (the grdb file) is no longer enough for the Berkeley database to handle the data. An "environment" directory is needed to store log files that make the transactions possible. We needed a place to keep these files so that the user would not delete them. We chose to store these in an environment directory under the ~/.gramps directory. We map the log files to the database using the path name of the original grdb file.

This works very well as long as the file is never moved. If the user renames the file, restores a backup of the file, or copies it to another machine, the file will no longer work, since it would no longer coorelate to the log files stored under ~/.gramps.

The Future - GRAMPS 3.0

We take a new approach with GRAMPS 3.0. The grdb file is being replaced. While we still use the Berkeley database, the user will no longer see a file. The user will not open a file, but will instead open a symbolic database name. This name will map a directory under ~/.gramps that will contain all the needed database files.

Since all files will be in the same directory, advanced users can make a backup of the entire directory, preserving the entire data. New users, who may not be familiar with the Linux filesystem, will not have to worry about finding their database, since a new Family Tree Manager will replace the old Open File dialog.