Difference between revisions of "GEPS 010: Relational Backend"

From Gramps
Jump to: navigation, search
(Discussion: removing to discussion page)
m (Reasons for adding a relational backend)
 
(29 intermediate revisions by 6 users not shown)
Line 1: Line 1:
This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS.  
+
{{man warn|This proposal has been withdrawn.|This original GEP was a design to use a relational database backend for Gramps. However, after experiments, it was shown to be too slow. However, it was shown that we could use other backends, but in an non-relational manner. Also, there are many [http://en.wikipedia.org/wiki/NoSQL NOSQL] solutions that would also work. Thus, a new proposal, [[GEPS 032: Database Backend API]] was created.}}
  
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).
+
This proposal is also related to [[GEPS 013: Gramps Webapp]] which can create a SQL interface.  
  
= Reasons for making the switch =
+
There is also a SQL import and export Addon.
= Research =
 
[[Relational database comparison]]
 
  
[[Database abstraction layers comparison]]
+
<hr>
  
 +
This page is for the discussion of a proposed implementation of a relational (SQL) backend for GRAMPS.
  
= SQL Backend =
+
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).
  
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).
+
= Reasons for adding a relational backend =
 +
Currently, GRAMPS uses a [[Gramps_Glossary#bsddb|BSDDB]] database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).
  
 
First, there are a number of facts related to this proposal:
 
First, there are a number of facts related to this proposal:
Line 18: Line 18:
 
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
 
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
 
# SQLITE is being added to the standard Python distribution
 
# SQLITE is being added to the standard Python distribution
# BSDDB is not a relational database, but a hierarchical one
+
# BSDDB is not a relational database, but a hierarchical datastore
 
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
 
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
 
# BSDDB is a programmer's API
 
# BSDDB is a programmer's API
 
# SQL is a declarative, independent abstraction layer
 
# SQL is a declarative, independent abstraction layer
 
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python
 
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python
# SQLite tables of a database reside in a single file
+
# An SQL version of a GRAMPS DB should be faster
 
+
# An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.
Next, are a number of claims that need to be tested:
+
# SQL Engines can perform query optimizations
 
+
# More code would reside in the db, rather than in Python
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller
 
# An SQLite version of a GRAMPS BSDDB may be faster
 
## The files may be smaller
 
## The smaller files may allow more into memory
 
## More code would reside in C, rather than in Python
 
## SQL Engines can perform query optimizations
 
 
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees
 
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees
 
# An SQLite version of GRAMPS might allow people to create larger trees
 
# An SQLite version of GRAMPS might allow people to create larger trees
Line 40: Line 34:
 
Further implications:
 
Further implications:
  
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)
+
# A fullscale MySQL backend would be a trivial step from SQLite
 +
True as long as the SQL written to access the database conforms to a particular defined standard. SQLite appears to support (most of) SQL-92. So for portability reasons we must code to the SQL-92 standard if SQLlite is chosen as the lowest common denominator. --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)
 
# Easy to allow multiple users in a SQLite database (uses file-locking)
 
# Easy to allow multiple users in a SQLite database (uses file-locking)
 
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
 
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
 
# We will have to develop SQL experts
 
# We will have to develop SQL experts
 +
I have been writing SQL for 20 years :) --[[User:Gburto01|Gburto01]] 23:06, 4 April 2009 (UTC)
  
<pre>
+
Perhaps one of the most important reasons for moving away from a low-level datastore is to allow more sophisticated interfaces. Currently BSDDB offers transactions and recover, or multiuser access, but not both together. If we want to have a GRAMPS webapp or a multiuser GTK interface (for examples), then we would have to either disable transactions and recover (dangerous!) or switch backends. (Another option would be to write a server interface, but the complexity of that is probably on par with just switching backends. See [[GEPS 013: Gramps Webapp]] for more discussion.)
It's good to see this discussion on gramps and is actually why I'm thinking of giving
 
it another try depending on how hard it is to implement this. Yes I know it will be hard
 
but probably much easier and productive than starting my own project. I'm  a developer my
 
self and when it came time to evaluate gramps the lack of a relational db backend was one
 
of the main reasons I decided to keep looking.
 
 
 
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable
 
version of MySQL which might overcome some of sqlites advantages. If a database abstraction
 
layer is used both could be
 
easily supported. They both have their advantages and disadvantages.
 
 
 
MySQL
 
Advantages
 
*far better tools for management and reporting
 
*a true enterprise level database capable of handling serious loads
 
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.
 
(sqlite may not even have triggers but I can't remember)
 
*far more extensive user base and support.
 
 
 
Disadvantages
 
*install size (bloat)
 
*an actual server to setup run and maintain.
 
** there are tools that can do this automatically though and make things almost none
 
existent for an end user. also the embeddable mysql might be an option.
 
*may be difficult to manage / share multiple databases. more difficult but very do able.
 
  maybe not even that difficult. it would just take some planning.
 
 
 
SQLite
 
Advantages
 
*far easier to setup. just start writing to the file! no connection or user accounts.
 
*smaller install (code) size.
 
*easier for users to manage / and share sepperate db's
 
*single file
 
*good support.
 
 
 
Disadvantage
 
*while great for what it is it's not an enterprise level database
 
*many "traditional" relational db things are lacking.
 
*while tools exist they aren't as fleshed out and solid as the mysql ones.
 
 
 
Personally I think SQLite makes more sense for genealogical software. but mysqls
 
tools and the fact that it's a "real" enterprise level relational db are serious advantages.
 
-- AaronS
 
</pre>
 
 
 
== Transportable Trees ==
 
 
 
From http://www.sqlite.org/onefile.html:
 
 
 
'''Single-file Cross-platform Database'''
 
 
 
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''
 
 
 
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''
 
 
 
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''
 
 
 
<pre>
 
The Single disk file of sqlite db would be a major selling point for sqlite
 
for genealogy software since users share and compare db's all the time.
 
--Aarons
 
</pre>
 
 
 
== Additional Issues ==
 
 
 
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?
 
 
 
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)
 
 
 
 
 
 
 
 
 
 
 
=== Power vs Dependencies ===
 
 
 
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)?
 
 
 
PROS:
 
 
 
# Makes GRAMPS code more abstract
 
 
 
CONS:
 
 
 
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)
 
# Adds a dependency
 
 
 
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?
 
 
 
Is the ORM available for all platforms?
 
  
 
== Discussions of BSDDB in Python ==
 
== Discussions of BSDDB in Python ==
Line 151: Line 58:
 
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm
 
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm
  
A sqlite shelve interface for Python:
+
= Research =
http://bugs.python.org/issue3783
+
*[[Relational database comparison]]
 +
*[[Database abstraction layers comparison]]
  
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:
+
= Questions and concerns =
SQLite versus Berkeley DB:
 
  
<pre>
+
== Native DB access for other languages ==
      Berkeley DB (BDB) is just the data storage layer - it does not
+
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?
      support SQL or schemas. In spite of this, BDB is twice the size
 
      of SQLite. A comparison between BDB and SQLite is similar to a
 
      comparison between assembly language and a dynamic language like
 
      Python or Tcl. BDB is probably much faster if you code it
 
      carefully. But it is much more difficult to use and considerably
 
      less flexible.
 
  
      On the other hand BDB has very fine grained locking (although
+
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)
      it's not very well documented), while SQLite currently has only
 
      database-level locking. -- fine grain locking is important for
 
      enterprise database engines, but much less so for embedded
 
      databases. In SQLite, a writer gets a lock, does an update, and
 
      releases the lock all in a few milliseconds. Other readers have
 
      to wait a few milliseconds to access the database, but is that
 
      really ever a serious problem?
 
</pre>
 
  
== Comparing from BSDDB to SQLite ==
+
== Power vs Dependencies ==
 +
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)?
  
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:
+
PROS:
  
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:
+
# Makes GRAMPS code more abstract
  
<pre>
+
CONS:
Berkeley DB Offers APIs, not Query Languages
 
  
Berkeley DB was designed for software developers, by software
+
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)
developers.  Relational database systems generally provide SQL access
+
# Adds a dependency
to the data that they manage, and usually offer some SQL abstraction,
 
like ODBC or JDBC, for use in applications.
 
</pre>
 
  
What BSDDB is not:
+
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?
  
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html
+
Is the ORM available for all platforms?
  
From previous GRAMPS discussions:
+
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's  SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent. --[[User:AaronS|AaronS]] 19:16, 26 March 2009 (UTC)
  
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w
+
Actually, I was wondering about OS platforms, but that info is useful, too. --[[User:Dsblank|Dsblank]] 00:40, 27 March 2009 (UTC)
  
From the GRAMPS archives:
+
I only researched python orms so they should work where ever python does. I confirmed sqlalchemy probably does since it supports MS-SQL and MSAccess --[[User:AaronS|AaronS]] 03:33, 27 March 2009 (UTC)
<pre>
 
> Now, sometimes we get a request for a major architectural change that we
 
> will accept. A good example is the new database backend for the upcoming
 
> GRAMPS 2.0. The request came in to support a real database backend so we
 
> could support larger databases. We analyzed the request, and felt that
 
> it matched the goals of the project and would provide a significant step
 
> forward in the usability of the program. The result was a major redesign
 
> effort that will soon be released.
 
  
I think I and few others are the ones that impacted this decision.  Having an
+
= What now? =
850,000 person database tends to be deadly to the XML architecture that we
 
were with.  I've been the main person to test the integrity of the system
 
with my Gedcom file importing.  When I found that I couldn't import my file
 
without extensive data loss, I came to Don and Alex and we all sought for
 
solutions.  We found that the XML interface was taking huge amounts of
 
memory, and we looked for database backends that would handle the load.  Don
 
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been
 
happy as a clam with the Gramps project, because I'm one step closer to
 
killing Windows.
 
 
 
I personally want to do away with it, but I need it for other applications. 
 
I've also come to the realization that both Windows and Linux are good, but
 
in their own realms.  I don't want this to become a huge flame war about
 
Linux and Windows. so if you have other questions as to why I feel this way,
 
email me.
 
 
 
> So, would we accept a mySQL database backend? There is a good chance we
 
> would (depending on the implementation), as long did not impact Aunt
 
> Martha. We have even architected the backend to support this, since we
 
> can see that higher end databases could provide additional functionality
 
> such as versioning and multiuser support.
 
 
 
We could accept mySQL because of this, but I agree with Don.  If it negatively
 
impacts the end user, why would we want to proceed with it?  I have a friend
 
that wondered about mySQL interaction, but he can see the impact that BSDDB
 
has had on my database, and he has sided with me as well as the rest of the
 
team.  Not to say that this is not a possibility, but we need to remain
 
focused on the tasks at hand.
 
  
> So, in summary, the project is going in a direction that seems to meet
+
Current DBI:
> the needs of our users. If we changed directions, we might or might not
 
> be able to reach a larger audience, but numbers are not our goal. We
 
> fully support others submitting patches and other contributions, but
 
> they will be weighed on how they match the goals of the project (and
 
> most of the patches we've received to date do match the goals). If
 
> someone wants us take the project in a different direction, we may or
 
> may not be receptive depending if the direction matches our goals.
 
> However, we will support your efforts if you decide to fork the project.
 
> Who knows, maybe a remerge will occur in the future, or a forked project
 
> will make us irrelevent.
 
  
I agree with Don on this, numbers don't matter as long as the users are happy. 
+
* [[Using_database_API]]
Getting things appropriately nailed down and ready for the end user's use is
 
what is paramount.  After all, if there were no users, why would we even have
 
a project with which to collaborate in the first place?
 
  
We are here for the users, especially Aunt Martha, because of the fact that
 
many people are just moving over to Linux and having something familiar to
 
them, like a genealogical program is what matters to them.  Making the
 
transition to Linux is hard, don't get me wrong.  But we are making it one
 
step easier by not complicating the user's experience in their move.
 
 
Like I said before, I'm just a bug finder.  I'm not really a Python
 
programmer, or anything, but I like to find bugs.  Even if that's all I do on
 
this project, I'm rather content.  Everyone else that wants to port over to
 
other toolkits and whatnot is free to do so.
 
 
But also as an end user that's still a greenie to Linux in general, I can say
 
that this program has helped my move over to Linux that much easier.  Even if
 
I have only contributed a little in the way of feedback (mostly from the
 
end-user perspective).
 
 
-Jason
 
</pre>
 
 
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:
 
 
<pre>
 
Alex said:
 
 
    SQLite might be better or it might not, we haven't tried it. A great factor
 
    speaking for BSDDB is that it is supported by a standard Python module,
 
    bsddb.
 
 
 
Don said:
 
 
This is an important factor here - ease of setup and use. GRAMPS is
 
difficult enough to get installed on some platforms (especially
 
KDE-centric systems). Requiring someone to get an SQL database up and
 
running to try out the program is probably too much effort. What I've
 
discovered is that GRAMPS is one of the first programs that a lot of
 
new users want to get running - usually before they have a lot of
 
Linux experience. So we can't make the barriers to entry too high.
 
</pre>
 
 
<pre>
 
"Requiring someone to get an SQL database up and
 
running to try out the program is probably too much effort." This simply isn't true of sqlite
 
at all. The program would simply write to the db file. No server setup, no user accounts, no
 
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL
 
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe
 
it's possible to use scripts and or code to manage launching and stoping the server. It might
 
be possible to make it seamless for the user but would depend on the implementation.
 
--AaronS
 
</pre>
 
 
=== Recomendations ===
 
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.
 
 
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite.
 
 
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.
 
 
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way.
 
 
Reasons I don't recommend the other options include:
 
 
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.
 
 
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.
 
 
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.
 
 
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.
 
 
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]
 
 
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)
 
 
= What now? =
 
 
== Create Object model==  
 
== Create Object model==  
  
Line 338: Line 104:
  
 
== Select an SQL framework==
 
== Select an SQL framework==
 
+
# finish research and pick a database.
 
# finish research and pick a database abstraction layer.
 
# finish research and pick a database abstraction layer.
# finish research and pick a database.
 
  
 
== Create models/tables ==
 
== Create models/tables ==
Line 357: Line 122:
 
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.
 
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.
 
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.
 
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....
+
# it will be very important to use properties in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....
  
 
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)
 
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)
 +
 +
Doug, I meant '''properties'''. The idea is the following. In gen/person an person contains all secondary objects and all references. In a database, this depends on the implementation. So bsddb has a flat table with the person object in one table. So one get. For sql, the secondary objects will be in other tables. Hitting all relevant tables on get of a person would be stupid. We only need to access the tables when the data is needed. In base.py, a person is obtained with get_person_from_handle. This fills up a Person() object. dbdir.py uses the unserialize method to do this. My suggestion of properties would be that with an sql backend, the handle is get, but all the other Person attributes are properties that remain None. If data is needed, the getter checks if value is None, if so, the sql database is hit, otherwise the data is just returned. Just an idea, the concept needs to be worked out. The alternative is changing /gen/lib dramatically, but I don't see the benefit of that. Note that making all objects in /gen/lib new style classes with slots would be a good move nonetheless. [[User:Bmcage|bmcage]] 13:22, 8 April 2009 (UTC)
 +
 +
Thanks for the clarification... that makes sense now :) The integration of the SQL DB with the existing GRAMPS gen/lib and gen/db does seem to be the biggest job to be done in this project. I've had a look at gen/lib and gen/db and there are lots of assumptions all over the place about how the data is stored. Trying to hide this through the use of properties may work, but in the long run, GRAMPS may need an API overhaul. But I like the idea of seeing how far we can get by using properties. --[[User:Dsblank|Dsblank]] 12:56, 9 April 2009 (UTC)
  
 
== Extending base.py ==
 
== Extending base.py ==
Line 369: Line 138:
 
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.
 
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.
  
== See Also ==
+
= See Also =
[[ExportSql.py]]
+
* [[SQLite Export Import]]
  
 
[[Category:GEPS|S]]
 
[[Category:GEPS|S]]

Latest revision as of 04:29, 23 December 2020

Gnome-important.png
This proposal has been withdrawn.

This original GEP was a design to use a relational database backend for Gramps. However, after experiments, it was shown to be too slow. However, it was shown that we could use other backends, but in an non-relational manner. Also, there are many NOSQL solutions that would also work. Thus, a new proposal, GEPS 032: Database Backend API was created.

This proposal is also related to GEPS 013: Gramps Webapp which can create a SQL interface.

There is also a SQL import and export Addon.


This page is for the discussion of a proposed implementation of a relational (SQL) backend for GRAMPS.

SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).

Reasons for adding a relational backend

Currently, GRAMPS uses a BSDDB database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).

First, there are a number of facts related to this proposal:

  1. BSDDB is being removed from the standard distribution of Python (as of Python 2.6)
  2. SQLITE is being added to the standard Python distribution
  3. BSDDB is not a relational database, but a hierarchical datastore
  4. BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code
  5. BSDDB is a programmer's API
  6. SQL is a declarative, independent abstraction layer
  7. SQL can optimize queries (in low-level C) whereas BSDDB is done in Python
  8. An SQL version of a GRAMPS DB should be faster
  9. An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.
  10. SQL Engines can perform query optimizations
  11. More code would reside in the db, rather than in Python
  12. Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees
  13. An SQLite version of GRAMPS might allow people to create larger trees
    1. Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases
  14. SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API

Further implications:

  1. A fullscale MySQL backend would be a trivial step from SQLite

True as long as the SQL written to access the database conforms to a particular defined standard. SQLite appears to support (most of) SQL-92. So for portability reasons we must code to the SQL-92 standard if SQLlite is chosen as the lowest common denominator. --Gburto01 23:06, 4 April 2009 (UTC)

  1. Easy to allow multiple users in a SQLite database (uses file-locking)
  2. There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)
  3. We will have to develop SQL experts

I have been writing SQL for 20 years :) --Gburto01 23:06, 4 April 2009 (UTC)

Perhaps one of the most important reasons for moving away from a low-level datastore is to allow more sophisticated interfaces. Currently BSDDB offers transactions and recover, or multiuser access, but not both together. If we want to have a GRAMPS webapp or a multiuser GTK interface (for examples), then we would have to either disable transactions and recover (dangerous!) or switch backends. (Another option would be to write a server interface, but the complexity of that is probably on par with just switching backends. See GEPS 013: Gramps Webapp for more discussion.)

Discussions of BSDDB in Python

BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:

PEP 3108 marks BSDDB to be removed: http://www.python.org/dev/peps/pep-3108/ Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm

Research

Questions and concerns

Native DB access for other languages

If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?

Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this php python package. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --AaronS 03:30, 26 March 2009 (UTC)

Power vs Dependencies

Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)?

PROS:

  1. Makes GRAMPS code more abstract

CONS:

  1. Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)
  2. Adds a dependency

Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?

Is the ORM available for all platforms?

I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's home and intro might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the support databases list is decent. --AaronS 19:16, 26 March 2009 (UTC)

Actually, I was wondering about OS platforms, but that info is useful, too. --Dsblank 00:40, 27 March 2009 (UTC)

I only researched python orms so they should work where ever python does. I confirmed sqlalchemy probably does since it supports MS-SQL and MSAccess --AaronS 03:33, 27 March 2009 (UTC)

What now?

Current DBI:

Create Object model

Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.

For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.

Select an SQL framework

  1. finish research and pick a database.
  2. finish research and pick a database abstraction layer.

Create models/tables

  1. use the framework to set up a model of the database
  2. generate the tables
  3. create a dump of bsddb database in the sql database
  4. validate that all things present in bsddb are present in the sql database
  5. check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is impossible on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.
  6. best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.

New db backend for GRAMPS

  1. write an implementation of src/gen/db/base.py to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: src/gen/db/dbdir.py, but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).
  2. once written, this can be added as an experimental backend to GRAMPS
    1. Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.
    2. User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.
  3. it will be very important to use properties in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....

I don't understand the use of slots in the above. How is that idea related to db access? --Dsblank 11:14, 26 March 2009 (UTC)

Doug, I meant properties. The idea is the following. In gen/person an person contains all secondary objects and all references. In a database, this depends on the implementation. So bsddb has a flat table with the person object in one table. So one get. For sql, the secondary objects will be in other tables. Hitting all relevant tables on get of a person would be stupid. We only need to access the tables when the data is needed. In base.py, a person is obtained with get_person_from_handle. This fills up a Person() object. dbdir.py uses the unserialize method to do this. My suggestion of properties would be that with an sql backend, the handle is get, but all the other Person attributes are properties that remain None. If data is needed, the getter checks if value is None, if so, the sql database is hit, otherwise the data is just returned. Just an idea, the concept needs to be worked out. The alternative is changing /gen/lib dramatically, but I don't see the benefit of that. Note that making all objects in /gen/lib new style classes with slots would be a good move nonetheless. bmcage 13:22, 8 April 2009 (UTC)

Thanks for the clarification... that makes sense now :) The integration of the SQL DB with the existing GRAMPS gen/lib and gen/db does seem to be the biggest job to be done in this project. I've had a look at gen/lib and gen/db and there are lots of assumptions all over the place about how the data is stored. Trying to hide this through the use of properties may work, but in the long run, GRAMPS may need an API overhaul. But I like the idea of seeing how far we can get by using properties. --Dsblank 12:56, 9 April 2009 (UTC)

Extending base.py

Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there.

Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see _HasAttributeBase.py

For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.

See Also