Difference between revisions of "GEPS 010: Relational Backend"

From Gramps
Jump to: navigation, search
(adding and refactoring questions and concerns)
(Power vs Dependencies: answering questions)
Line 78: Line 78:
 
Is the ORM available for all platforms?
 
Is the ORM available for all platforms?
  
 +
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's  SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent.
  
 
= What now? =
 
= What now? =

Revision as of 19:15, 26 March 2009

This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS.

SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).

Reasons for adding a sql backend

Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).

First, there are a number of facts related to this proposal:

  1. BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
  2. SQLITE is being added to the standard Python distribution
  3. BSDDB is not a relational database, but a hierarchical one
  4. BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
  5. BSDDB is a programmer's API
  6. SQL is a declarative, independent abstraction layer
  7. SQL can optimize queries (in low-level C) whereas BSDDB is done in Python
  8. An SQL version of a GRAMPS DB should be faster
  9. An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.
  10. SQL Engines can perform query optimizations
  11. More code would reside in the db, rather than in Python
  12. Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees
  13. An SQLite version of GRAMPS might allow people to create larger trees
    1. Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases
  14. SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API

Further implications:

  1. A fullscale MySQL backend would be a trivial step from SQLite
  2. Easy to allow multiple users in a SQLite database (uses file-locking)
  3. There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
  4. We will have to develop SQL experts


Discussions of BSDDB in Python

BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:

PEP 3108 marks BSDDB to be removed: http://www.python.org/dev/peps/pep-3108/ Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm


Research

Relational database comparison

Database abstraction layers comparison


Questions and concerns

Native DB access for other languages

If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?

Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this php python package. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --AaronS 03:30, 26 March 2009 (UTC)

Power vs Dependencies

Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)?

PROS:

  1. Makes GRAMPS code more abstract

CONS:

  1. Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)
  2. Adds a dependency

Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?

Is the ORM available for all platforms?

I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's home and intro might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the support databases list is decent.

What now?

Create Object model

Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.

For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.

Select an SQL framework

  1. finish research and pick a database.
  2. finish research and pick a database abstraction layer.

Create models/tables

  1. use the framework to set up a model of the database
  2. generate the tables
  3. create a dump of bsddb database in the sql database
  4. validate that all things present in bsddb are present in the sql database
  5. check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is impossible on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.
  6. best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.

New db backend for GRAMPS

  1. write an implementation of src/gen/db/base.py to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: src/gen/db/dbdir.py, but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).
  2. once written, this can be added as an experimental backend to GRAMPS
    1. Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.
    2. User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.
  3. it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....

I don't understand the use of slots in the above. How is that idea related to db access? --Dsblank 11:14, 26 March 2009 (UTC)

Extending base.py

Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there.

Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see _HasAttributeBase.py

For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.

See Also

ExportSql.py