DB-API Database Backend
Gramps 5.0 comes with a new, alternative database backend called DB-API. The default database will be the old standard Berkeley DB (often abbreviated as BSDDB). Gramps 5.0 is the initial test of the DB-API backend. If it works well, then Gramps 5.1 will make it the default.
What is DB-API? DB-API is a very loose standard to connect to "SQL engines". DB-API doesn't standardize any of the SQL, but merely standardizes the ability to connect, query, and get results back. You can read more about the DB-API specification.
Gramps has been designed to work with two DB-API SQL engines:
- sqlite3 - single or multi-user on single operating system
- postgresql - local or remote, single or multi-user
sqlite3 has been tested the most and is the default SQL engine. However, advanced users can adjust the details of the SQL connect. See #Advanced uses for more details.
Gramps 5.0 provides the opportunity to set up your databases in many different configurations. Some possible uses:
- Users will just use DB-API with the default sqlite. Nothing will really be different from previous uses of BSDDB, but there is no lock file, and the data is much less likely to become corrupted.
- Users can share their sqlite file on a single file system. Any user logged into that computer can work together on the same file. Operations like import will lock the file, but individual edits can be made simultaneously. Note: there is no code to prevent editing the same record at the same time. For sqlite, we would have to implement our own row-locking code. Or perhaps just give a warning if the data has changed since we started editing it.
- Users can put their sqlite database on a file-sharing drive, like Dropbox, Google Drive, etc. This is tricky to understand. These systems are file-sharing, not record-sharing. That means that you can't have two users writing at the same time. In fact, if you have two users writing, you probably want to keep the session short. Start gramps, make changes, exit. Or just browse. Or take turns.
- Multiple users can use postgresql locally. This is relatively speedy, but only really useful if the database is also made available over the network. Otherwise, you would probably just use sqlite.
- Multiple users can use postgresql remotely. This is the most powerful option, but also the slowest. Gramps is currently written in a manner that assumes a fast, local connection to the database. Over time, we can refine Gramps code to support this mode better, which will make the other uses faster as well. In any event, this mode should work well, once you have your large data loaded. All operations should work as normal.
There are other possibilities, and even combinations of the above.
The easiest way to use a variation in SQL engines is to make a new DB-API database. You can do that by changing the default backend in Preferences -> Family Tree -> Database backend, Converting an existing database (new button in Manager), or CLI importing:
gramps --config=database.backend:dbapi -C "DB-API" --import filename
Once you have a database, you can edit the default_settings.py file in your grampsdb/*/ directory.
For a remote postgresql database, it might look like:
from dbapi_support.postgresql import Postgresql dbapi = Postgresql(dbname='postgres', user='postgres', host='remote.computer.com', password='MyPassW0rd8723')
Getting a postgresql server running on Ubuntu is fairly easy.
DB-API is just one implementation of a generic DB interface. The class gramps.gen.db.generic.DbGeneric implements all of the methods of a database while the actual DB-interfacing code would be in a subclass of DbGeneric. For example, gramps.plugins.db.dbapi implements the DB-API SQL code. Other future versions need only implement a subclass. Other possibilities might be one of the so-called "no sql" database, such as CouchDB or MongoDB.
Does the use of DB-API mean that Gramps data are now relational? Does it provide additional data integrity? No. Currently, we are using the SQL engines in exactly the same way that we used BSDDB: all of our data is pickled "blobs" (serialized tuples). However, the flat (non-hierarchical) data is also stored in their own column, and indexes are made on each of those. This allows us to have very fast ad hoc queries on any non-hierarchical data, and JOINS between primary objects. As far as I know, we always did sequential scans in BSDDB. If we connect the SQL WHERE capability up to Filters, then our filters will not require a linear scan through all data. This will be a huge win, but requires a some serious refactoring of views, filters, and the use of the database.
Gramps could have used an SQL abstraction layer, such as SQLAlchemy or Django, but that would be a large dependency for Gramps. The DB-API layer is completely separate; each SQL engine provides its own DB-API interface. As this first step away from BSDDB, we avoided any dependencies. That allows us to write directly in SQL, but also requires that all of the SQL that we do write has to be understood by our SQL engines.
All of the SQL statements are in a single file: gramps/plugins/database/dbapi.py. If we wish to write more abstractly than raw SQL using, say, SQLAlchemy, then we need to just re-implement that file. That will add a dependency, but will allow us to add Oracle and SQL Server to the list of supported SQL engines, and allow us to write more sophisticated and optimized SQL. Of course, as this is a plugin, any new backend can be added to addons at any time. Two other possible future backends are MongoDB and CouchDB. Both of these match our data better (hierarchical), but are also newer/less stable, and limited in variability. Sqlite file structure is well-known, backwards compatible, and rock solid. Postgresql provide server/shared abilities.
The SQL engines are all placed into "autocommit mode". That means that any updates automatically will be placed into a transaction. However, batch transactions (such as an import) will explicitly use a SQL "BEGIN TRANSACTION;" statement, and end with a "COMMIT TRANSACTION;" statement.