This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS.

A proposed implementation is being developed in trunk/src/plugins/Sql.py You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).

SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as Untitled_1.sql using the above Exporter, then you can use SQL queries like:

$ sqlite3 Untitled_1.sql
SQLite version 3.5.9
Enter ".help" for instructions

sqlite> .tables
dates       family      names       people      repository
events      media       notes       places      sources   

sqlite> .headers on
.headers on

sqlite> select * from people;
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0

sqlite> select * from names where surname like "%Smith%";
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call
0|Test|Smith|||2|||||0|0|

sqlite> .exit
$

The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.

SQL Backend

Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL, specifically SQLite as a replacement.

First, there are a number of facts related to this proposal:

BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
SQLITE is being added to the standard distribution
BSDDB is not a relational database, but a hierarchical one
BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
BSDDB is a programmer's API
SQL is a declarative, independent abstraction layer
SQLite tables of a database reside in a single file
SQLite can optimize queries (in low-level C) whereas BSDDB is done in Python

Next, are a number of issues that need to be tested:

An SQLite version of a GRAMPS BSDDB may be 4 times smaller
An SQLite version of a GRAMPS BSDDB may be faster
1. The files may be smaller
2. The smaller files may allow more into memory
3. More code would reside in C, rather than in Python
4. SQL Engines can perform query optimizations
An SQLite version of GRAMPS might allow people to create bigger trees
1. Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases
SQLite code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API

Further implications:

A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)
Easy to allow multiple users in a SQLite database (uses file-locking)
There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
We will have to develop SQL experts

DB-API

Python has a API to make it easy to move from one SQL-based DB to another called DB-API:

http://wiki.python.org/moin/DatabaseProgramming/

I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends.

Django also provides DB independence, but is geared towards web deployment:

http://www.djangoproject.com/

Discussions of BSDDB in Python

BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:

PEP 3108 marks BSDDB to be removed: http://www.python.org/dev/peps/pep-3108/

A sqlite shelve interface for Python: http://bugs.python.org/issue3783

From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors: SQLite versus Berkeley DB:

      Berkeley DB (BDB) is just the data storage layer - it does not
      support SQL or schemas. In spite of this, BDB is twice the size
      of SQLite. A comparison between BDB and SQLite is similar to a
      comparison between assembly language and a dynamic language like
      Python or Tcl. BDB is probably much faster if you code it
      carefully. But it is much more difficult to use and considerably
      less flexible.

      On the other hand BDB has very fine grained locking (although
      it's not very well documented), while SQLite currently has only
      database-level locking. -- fine grain locking is important for
      enterprise database engines, but much less so for embedded
      databases. In SQLite, a writer gets a lock, does an update, and
      releases the lock all in a few milliseconds. Other readers have
      to wait a few milliseconds to access the database, but is that
      really ever a serious problem?

Comparing from BSDDB to SQLite

A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:

Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:

Berkeley DB Offers APIs, not Query Languages 

Berkeley DB was designed for software developers, by software
developers.  Relational database systems generally provide SQL access
to the data that they manage, and usually offer some SQL abstraction,
like ODBC or JDBC, for use in applications.

What BSDDB is not:

http://pybsddb.sourceforge.net/ref/intro/dbisnot.html

From previous GRAMPS discussions:

http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w

From the GRAMPS archives:

> Now, sometimes we get a request for a major architectural change that we
> will accept. A good example is the new database backend for the upcoming
> GRAMPS 2.0. The request came in to support a real database backend so we
> could support larger databases. We analyzed the request, and felt that
> it matched the goals of the project and would provide a significant step
> forward in the usability of the program. The result was a major redesign
> effort that will soon be released.

I think I and few others are the ones that impacted this decision.  Having an 
850,000 person database tends to be deadly to the XML architecture that we 
were with.  I've been the main person to test the integrity of the system 
with my Gedcom file importing.  When I found that I couldn't import my file 
without extensive data loss, I came to Don and Alex and we all sought for 
solutions.  We found that the XML interface was taking huge amounts of 
memory, and we looked for database backends that would handle the load.  Don 
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been 
happy as a clam with the Gramps project, because I'm one step closer to 
killing Windows.

I personally want to do away with it, but I need it for other applications.  
I've also come to the realization that both Windows and Linux are good, but 
in their own realms.  I don't want this to become a huge flame war about 
Linux and Windows. so if you have other questions as to why I feel this way, 
email me.

> So, would we accept a mySQL database backend? There is a good chance we
> would (depending on the implementation), as long did not impact Aunt
> Martha. We have even architected the backend to support this, since we
> can see that higher end databases could provide additional functionality
> such as versioning and multiuser support.

We could accept mySQL because of this, but I agree with Don.  If it negatively 
impacts the end user, why would we want to proceed with it?  I have a friend 
that wondered about mySQL interaction, but he can see the impact that BSDDB 
has had on my database, and he has sided with me as well as the rest of the 
team.  Not to say that this is not a possibility, but we need to remain 
focused on the tasks at hand.

> So, in summary, the project is going in a direction that seems to meet
> the needs of our users. If we changed directions, we might or might not
> be able to reach a larger audience, but numbers are not our goal. We
> fully support others submitting patches and other contributions, but
> they will be weighed on how they match the goals of the project (and
> most of the patches we've received to date do match the goals). If
> someone wants us take the project in a different direction, we may or
> may not be receptive depending if the direction matches our goals.
> However, we will support your efforts if you decide to fork the project.
> Who knows, maybe a remerge will occur in the future, or a forked project
> will make us irrelevent.

I agree with Don on this, numbers don't matter as long as the users are happy.  
Getting things appropriately nailed down and ready for the end user's use is 
what is paramount.  After all, if there were no users, why would we even have 
a project with which to collaborate in the first place?

We are here for the users, especially Aunt Martha, because of the fact that 
many people are just moving over to Linux and having something familiar to 
them, like a genealogical program is what matters to them.  Making the 
transition to Linux is hard, don't get me wrong.  But we are making it one 
step easier by not complicating the user's experience in their move.

Like I said before, I'm just a bug finder.  I'm not really a Python 
programmer, or anything, but I like to find bugs.  Even if that's all I do on 
this project, I'm rather content.  Everyone else that wants to port over to 
other toolkits and whatnot is free to do so.

But also as an end user that's still a greenie to Linux in general, I can say 
that this program has helped my move over to Linux that much easier.  Even if 
I have only contributed a little in the way of feedback (mostly from the 
end-user perspective).

-Jason

From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:

Alex said:

    SQLite might be better or it might not, we haven't tried it. A great factor
    speaking for BSDDB is that it is supported by a standard Python module,
    bsddb. 


Don said:

This is an important factor here - ease of setup and use. GRAMPS is
difficult enough to get installed on some platforms (especially
KDE-centric systems). Requiring someone to get an SQL database up and
running to try out the program is probably too much effort. What I've
discovered is that GRAMPS is one of the first programs that a lot of
new users want to get running - usually before they have a lot of
Linux experience. So we can't make the barriers to entry too high.

GEPS 010: Relational Backend

Contents

SQL Backend

DB-API

Discussions of BSDDB in Python

Comparing from BSDDB to SQLite

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Contributor help pages

wiki

Tools