https://www.gramps-project.org/wiki/api.php?action=feedcontributions&user=AaronS&feedformat=atomGramps - User contributions [en]2024-03-28T16:29:47ZUser contributionsMediaWiki 1.31.3https://www.gramps-project.org/wiki/index.php?title=User_talk:Dsblank&diff=15222User talk:Dsblank2009-03-30T18:24:28Z<p>AaronS: </p>
<hr />
<div>Hello Doug,<br />
<br />
Thanks for everything. I appreciate all your work with GRAMPS very much. That should be obvious anyway: who wouldn't? --[[User:Lcc|Lcc]] 19:46, 22 March 2009 (UTC)<br />
<br />
== DB Schema ==<br />
<br />
How goes the work on the DB schema? Could I help? I think I've done all the research on DB abstraction that I can. I'm not certain it's necessary to push the main devs on an abstraction layer quite yet. I'd like to test them out a little first. The DB schema is something we can nail down and push for a consensus on though. --[[User:AaronS|AaronS]] 04:37, 29 March 2009 (UTC)<br />
<br />
If you wanted to help explore the DB schema, you could attempt to derive a set of tables from the [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/data/grampsxml.dtd?view=markup GRAMPS XML dtd]. This is a different approach from what I was doing (converting the Python BSDDB code directly to DB schema, keeping intact all GRAMPS handles, and making up new ones where required). Thanks for getting this ball rolling! --[[User:Dsblank|Dsblank]] 23:19, 29 March 2009 (UTC)<br />
<br />
Is your most current work on the schema at [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py]? I'd prefer to keep our work coordinated together. We could just work directly with the exportsql.py module but it may make sense to work with more of a straight up db import file of just table creates and drops. relationships could be put in comments. that might make it less clutter for the main devs when we pitch it to them. Another option would be to create an actual table chart. have you ever used dia? I haven't found a really great table charting app yet but dia does an OK job. I don't have svn commit yet. how would you prefer to pass work back and forth? a tracker item. wiki page or wiki media file? Also to recap should we just use the exportsql.py code, db import file, or table mapping chart? --[[User:AaronS|AaronS]] 04:35, 30 March 2009 (UTC)<br />
<br />
Yes, that is the first stab at just creating a simple sqlite export... mostly just thinking out loud there. I agree that it would be a good idea to diagram the tables. It would be nice to have a table representation that could '''generate''' UML, dia, code... perhaps all we need to do is describe the tables in a text file, which could be read in by our programs to generate these outputs? I guess the XML doesn't work for that role, as it is hierarchical ... maybe a flat XML DTD version? --[[User:Dsblank|Dsblank]] 14:50, 30 March 2009 (UTC)<br />
<br />
Do you have a specific solution in mind? That would be nice but I haven't heard of anything that can do all that. I was thinking the easiest would be to just have a import.sql file that would have all the sql to create the tables. We might have to have some additional notes but there aren't too many tables so a full table mapping chart might not be necessary though nice. Most 3rd party sqlite db tools should be able to just automatically load them and export them. firefox has a sqlite manager plugin that I've used in the past. https://addons.mozilla.org/en-US/firefox/addon/5817. I know we haven't fully settled on sqlite but it should be fine for a schema mockup. --[[User:AaronS|AaronS]] 17:36, 30 March 2009 (UTC)<br />
<br />
I don't have time right now to start looking into them but there may be some options that might help with at least some of it. Unless you have a specific solution in mind i'll do some research on them later.<br />
*http://ubuntuforums.org/showthread.php?s=f09969c8aacbf58986fe849e69193901&t=306068&page=2<br />
*http://discuss.joelonsoftware.com/default.asp?joel.3.85804.13<br />
*http://tedia2sql.tigris.org/ (i know dia is cross platform so i'll check this out first) but I don't think it has sqlite support but for schema mysql queries should look close enough for discussion.<br />
*http://www.databaseanswers.com/modelling_tools.htm<br />
--[[User:AaronS|AaronS]] 18:23, 30 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=User_talk:Dsblank&diff=15221User talk:Dsblank2009-03-30T18:23:43Z<p>AaronS: </p>
<hr />
<div>Hello Doug,<br />
<br />
Thanks for everything. I appreciate all your work with GRAMPS very much. That should be obvious anyway: who wouldn't? --[[User:Lcc|Lcc]] 19:46, 22 March 2009 (UTC)<br />
<br />
== DB Schema ==<br />
<br />
How goes the work on the DB schema? Could I help? I think I've done all the research on DB abstraction that I can. I'm not certain it's necessary to push the main devs on an abstraction layer quite yet. I'd like to test them out a little first. The DB schema is something we can nail down and push for a consensus on though. --[[User:AaronS|AaronS]] 04:37, 29 March 2009 (UTC)<br />
<br />
If you wanted to help explore the DB schema, you could attempt to derive a set of tables from the [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/data/grampsxml.dtd?view=markup GRAMPS XML dtd]. This is a different approach from what I was doing (converting the Python BSDDB code directly to DB schema, keeping intact all GRAMPS handles, and making up new ones where required). Thanks for getting this ball rolling! --[[User:Dsblank|Dsblank]] 23:19, 29 March 2009 (UTC)<br />
<br />
Is your most current work on the schema at [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py]? I'd prefer to keep our work coordinated together. We could just work directly with the exportsql.py module but it may make sense to work with more of a straight up db import file of just table creates and drops. relationships could be put in comments. that might make it less clutter for the main devs when we pitch it to them. Another option would be to create an actual table chart. have you ever used dia? I haven't found a really great table charting app yet but dia does an OK job. I don't have svn commit yet. how would you prefer to pass work back and forth? a tracker item. wiki page or wiki media file? Also to recap should we just use the exportsql.py code, db import file, or table mapping chart? --[[User:AaronS|AaronS]] 04:35, 30 March 2009 (UTC)<br />
<br />
Yes, that is the first stab at just creating a simple sqlite export... mostly just thinking out loud there. I agree that it would be a good idea to diagram the tables. It would be nice to have a table representation that could '''generate''' UML, dia, code... perhaps all we need to do is describe the tables in a text file, which could be read in by our programs to generate these outputs? I guess the XML doesn't work for that role, as it is hierarchical ... maybe a flat XML DTD version? --[[User:Dsblank|Dsblank]] 14:50, 30 March 2009 (UTC)<br />
<br />
Do you have a specific solution in mind? That would be nice but I haven't heard of anything that can do all that. I was thinking the easiest would be to just have a import.sql file that would have all the sql to create the tables. We might have to have some additional notes but there aren't too many tables so a full table mapping chart might not be necessary though nice. Most 3rd party sqlite db tools should be able to just automatically load them and export them. firefox has a sqlite manager plugin that I've used in the past. https://addons.mozilla.org/en-US/firefox/addon/5817. I know we haven't fully settled on sqlite but it should be fine for a schema mockup. --[[User:AaronS|AaronS]] 17:36, 30 March 2009 (UTC)<br />
<br />
I don't have time right now to start looking into them but there may be some options that might help with at least some of it. Unless you have a specific solution in mind i'll do some research on them later.<br />
http://ubuntuforums.org/showthread.php?s=f09969c8aacbf58986fe849e69193901&t=306068&page=2<br />
http://discuss.joelonsoftware.com/default.asp?joel.3.85804.13<br />
http://tedia2sql.tigris.org/ (i know dia is cross platform so i'll check this out first) but I don't think it has sqlite support but for schema mysql queries should look close enough for discussion.<br />
http://www.databaseanswers.com/modelling_tools.htm<br />
--[[User:AaronS|AaronS]] 18:23, 30 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=User_talk:Dsblank&diff=15220User talk:Dsblank2009-03-30T17:36:41Z<p>AaronS: </p>
<hr />
<div>Hello Doug,<br />
<br />
Thanks for everything. I appreciate all your work with GRAMPS very much. That should be obvious anyway: who wouldn't? --[[User:Lcc|Lcc]] 19:46, 22 March 2009 (UTC)<br />
<br />
== DB Schema ==<br />
<br />
How goes the work on the DB schema? Could I help? I think I've done all the research on DB abstraction that I can. I'm not certain it's necessary to push the main devs on an abstraction layer quite yet. I'd like to test them out a little first. The DB schema is something we can nail down and push for a consensus on though. --[[User:AaronS|AaronS]] 04:37, 29 March 2009 (UTC)<br />
<br />
If you wanted to help explore the DB schema, you could attempt to derive a set of tables from the [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/data/grampsxml.dtd?view=markup GRAMPS XML dtd]. This is a different approach from what I was doing (converting the Python BSDDB code directly to DB schema, keeping intact all GRAMPS handles, and making up new ones where required). Thanks for getting this ball rolling! --[[User:Dsblank|Dsblank]] 23:19, 29 March 2009 (UTC)<br />
<br />
Is your most current work on the schema at [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py]? I'd prefer to keep our work coordinated together. We could just work directly with the exportsql.py module but it may make sense to work with more of a straight up db import file of just table creates and drops. relationships could be put in comments. that might make it less clutter for the main devs when we pitch it to them. Another option would be to create an actual table chart. have you ever used dia? I haven't found a really great table charting app yet but dia does an OK job. I don't have svn commit yet. how would you prefer to pass work back and forth? a tracker item. wiki page or wiki media file? Also to recap should we just use the exportsql.py code, db import file, or table mapping chart? --[[User:AaronS|AaronS]] 04:35, 30 March 2009 (UTC)<br />
<br />
Yes, that is the first stab at just creating a simple sqlite export... mostly just thinking out loud there. I agree that it would be a good idea to diagram the tables. It would be nice to have a table representation that could '''generate''' UML, dia, code... perhaps all we need to do is describe the tables in a text file, which could be read in by our programs to generate these outputs? I guess the XML doesn't work for that role, as it is hierarchical ... maybe a flat XML DTD version? --[[User:Dsblank|Dsblank]] 14:50, 30 March 2009 (UTC)<br />
<br />
Do you have a specific solution in mind? That would be nice but I haven't heard of anything that can do all that. I was thinking the easiest would be to just have a import.sql file that would have all the sql to create the tables. We might have to have some additional notes but there aren't too many tables so a full table mapping chart might not be necessary though nice. Most 3rd party sqlite db tools should be able to just automatically load them and export them. firefox has a sqlite manager plugin that I've used in the past. https://addons.mozilla.org/en-US/firefox/addon/5817. I know we haven't fully settled on sqlite but it should be fine for a schema mockup. --[[User:AaronS|AaronS]] 17:36, 30 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=User_talk:Dsblank&diff=15216User talk:Dsblank2009-03-30T04:35:01Z<p>AaronS: </p>
<hr />
<div>Hello Doug,<br />
<br />
Thanks for everything. I appreciate all your work with GRAMPS very much. That should be obvious anyway: who wouldn't? --[[User:Lcc|Lcc]] 19:46, 22 March 2009 (UTC)<br />
<br />
== DB Schema ==<br />
<br />
How goes the work on the DB schema? Could I help? I think I've done all the research on DB abstraction that I can. I'm not certain it's necessary to push the main devs on an abstraction layer quite yet. I'd like to test them out a little first. The DB schema is something we can nail down and push for a consensus on though. --[[User:AaronS|AaronS]] 04:37, 29 March 2009 (UTC)<br />
<br />
If you wanted to help explore the DB schema, you could attempt to derive a set of tables from the [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/data/grampsxml.dtd?view=markup GRAMPS XML dtd]. This is a different approach from what I was doing (converting the Python BSDDB code directly to DB schema, keeping intact all GRAMPS handles, and making up new ones where required). Thanks for getting this ball rolling! --[[User:Dsblank|Dsblank]] 23:19, 29 March 2009 (UTC)<br />
<br />
Is your most current work on the schema at [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py]? I'd prefer to keep our work coordinated together. We could just work directly with the exportsql.py module but it may make sense to work with more of a straight up db import file of just table creates and drops. relationships could be put in comments. that might make it less clutter for the main devs when we pitch it to them. Another option would be to create an actual table chart. have you ever used dia? I haven't found a really great table charting app yet but dia does an OK job. I don't have svn commit yet. how would you prefer to pass work back and forth? a tracker item. wiki page or wiki media file? Also to recap should we just use the exportsql.py code, db import file, or table mapping chart? --[[User:AaronS|AaronS]] 04:35, 30 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=User_talk:Dsblank&diff=15199User talk:Dsblank2009-03-29T04:37:03Z<p>AaronS: </p>
<hr />
<div>Hello Doug,<br />
<br />
Thanks for everything. I appreciate all your work with GRAMPS very much. That should be obvious anyway: who wouldn't? --[[User:Lcc|Lcc]] 19:46, 22 March 2009 (UTC)<br />
<br />
<br />
How goes the work on the DB schema? Could I help? I think I've done all the research on DB abstraction that I can. I'm not certain it's necessary to push the main devs on an abstraction layer quite yet. I'd like to test them out a little first. The DB schema is something we can nail down and push for a consensus on though. --[[User:AaronS|AaronS]] 04:37, 29 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Brief_introduction_to_SVN&diff=15193Brief introduction to SVN2009-03-27T23:38:44Z<p>AaronS: clarifying this step since it hung me up.</p>
<hr />
<div>The development source code of GRAMPS is stored in the SVN repository. This helps synchronizing changes from various developers, tracking changes, managing releases, etc. If you are reading this, you probably want to do just two things with SVN: download latest source or the development version, or upload your changes.<br />
<br />
== Types of branches ==<br />
There are two kinds of branches in the Subversion Repository: "trunk" and "maintenance branches". <br />
<br />
The first type is a "trunk". There is only one trunk. All new feature development happens in the trunk. New releases never come from the trunk.<br />
The trunk for GRAMPS can be found here: https://gramps.svn.sourceforge.net/svnroot/gramps/trunk<br />
<br />
The other type of branch is a "maintenance branch". There are many maintenance branches. A maintenance branch is created from the trunk when all the features for a release are complete. New features are not committed to maintenance branch. Releases only come from maintenance branches. The purpose of maintenance branches is to allow the line of code to stabilize while new features are added in trunk.<br />
<br />
The first two digits of the GRAMPS version number are reserved to indicate the maintenance branch the code came from. The last digit indicates the revision from that maintenance branch. For example, 3.0.4 would indicate the 5th release from the 3.0 branch (3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4).<br />
<br />
Here is a hypothetical example:<br />
Imagine that the current version of GRAMPS is 8.3.2. A new series of features has been added in trunk and are ready for release. A new maintenance branch is created from trunk named 8.4 (or possibly 9.0 depending on the nature of the new features). New features continue to be added in trunk that will not be included in the 8.4 series of releases, but will be included in the 8.5 series. Bug fixes continue to occur in the 8.4 branch until the code is deemed worthy of release. At that time, a release is tagged from the 8.4 maintenance branch and named 8.4.0. Some time after the release of 8.4.0, some bugs are found and fixed in the 8.4 maintenance branch. Those bug fixes are released as 8.4.1.<br />
<br />
== Stable version 3.0.x ==<br />
* To download the source to a /home/~user/gramps30 directory, you can use two methods to access the SVN repository:<br />
# An http frontend to gramps SVN<br />
# SVN access <br />
* To upload your changes, you have to have developer access. <br />
<br />
The second method requires that svn be installed on your system (Debian/Ubuntu: <code>apt-get install subversion</code>; Fedora: <code>yum install subversion</code>).<br />
With the SVN method, type the following in the command line: <code><br />
svn co https://gramps.svn.sourceforge.net/svnroot/gramps/branches/gramps30 gramps30</code><br />
<br />
You should see the downloading progress reported in your terminal. If you would like to update your source tree after some time, execute the following command in the top directory of the gramps30 source tree:<br />
<br />
<code><br />
svn update<br />
</code><br />
<br />
To commit your changes, you can execute:<br />
<br />
<code><br />
svn commit -m "message describing the nature of the change"<br />
</code><br />
<br />
Since uploading is a potentially dangerous operation, you have to explicitly obtain a write access to the SVN repository from Don Allingham, Alex Roitman, or Brian Matherly.<br />
<br />
== Unstable development: "trunk" ==<br />
<br />
:Also see: [[Running_a_development_version_of_Gramps|Running a development version of GRAMPS]], and [[Getting Started with GRAMPS 3]]<br />
<br />
=== Packages ===<br />
<br />
* '''Unix-like systems''': a first beta has been released as a downloadable package, see [http://sourceforge.net/project/showfiles.php?group_id=25770 sourceforge unstable]. <br />
* '''Windows systems''': follow the directions at http://www.dancingpaper.com/gramps/. <br />
<br />
=== Obtain it===<br />
As of January 2009, there are several versions of the gramps code in SVN. The development branch for small changes and bug fixes is 'gramps31' and 'trunk' has been created for the ongoing unstable version. If this talk of 'branch' and 'trunk' sounds confusing you might like to read the list message [http://article.gmane.org/gmane.comp.genealogy.gramps.devel/8678 explaining branch and trunk].<br />
<br />
To checkout a copy of the possibly unstable trunk to ./trunk:<br />
<code><br />
svn co https://gramps.svn.sourceforge.net/svnroot/gramps/trunk gramps32<br />
</code><br />
<br />
To checkout a copy of the next branch GRAMPS 3.1 ./gramps31:<br />
<code><br />
svn co https://gramps.svn.sourceforge.net/svnroot/gramps/branches/gramps31 gramps31<br />
</code><br />
<br />
To checkout a copy of the stable GRAMPS 3.0 ./gramps30:<br />
<code><br />
svn co https://gramps.svn.sourceforge.net/svnroot/gramps/branches/gramps30 gramps30<br />
</code><br />
<br />
To checkout a copy of the older stable GRAMPS 2.2 ./gramps22:<br />
<code><br />
svn co https://gramps.svn.sourceforge.net/svnroot/gramps/branches/gramps22 gramps22<br />
</code><br />
<br />
=== Prepare it ===<br />
Now go into the <code>gramps30</code> directory and type<br />
./autogen.sh<br />
You will get warnings of missing packages that GRAMPS needs to build from source. The most common warnings are, that you miss the gnome-common package if you run under Linux and Gnome. If you run Ubuntu install via Synaptic the 'gnome-common' (version 2.20.0-0ubuntu1): common scripts and macros to develop with GNOME: gnome-common is an extension to autoconf, automake and libtool for the GNOME<br />
environment and GNOME using applications. Included are gnome-autogen.sh and several macros to help in both GNOME and GNOME 2.0 source trees. Install these and/or any other missing packages, read INSTALL and README file in the gramps30 dir for pointers. An important library is also libglib2.0-dev. Check whether your system has this package installed.<br />
This will execute the make command too. If not, type after the above<br />
make<br />
<br />
{{man warn|1=Warning|2=Do not install the development version. That is, do '''not''' type {{man label|sudo make install}}. }}<br />
<br />
==== Building with Fedora 8 - 10 ====<br />
<br />
These are the packages you need:<br />
<br />
<code><br />
yum install gnome-common intltool glib2-devel gnome-doc-utils gcc emacs gettext subversion make rcs<br />
</code><br />
<br />
Now you can run the '''./autogen.sh''' script and then '''make'''.<br />
<br />
==== Windows ====<br />
This step appears unnecessary on windows? See [[Installation#Installing_from_source_code_on_Windows]]<br />
<br />
=== Run the development version ===<br />
As you should not install the development version, how can you try it out? <br />
Easy, just type the following in the <code>gramps30</code> directory<br />
python src/gramps.py<br />
<br />
<br />
{{man warn|1=warning|2=Do not open your existing databases with gramps 3.0, it might destroy your data, and will make it impossible to use the data in the stable version 2.2.x. To try it out, export your database to a gramps xml file, eg <code>test_version_3.0.gramps</code>, create a new family tree in GRAMPS 3.0, and import this xml file.}}<br />
<br />
=== Where for bugs? ===<br />
The [http://bugs.gramps-project.org bug tracker] has in the right top angle different projects. Choose project 3.x and submit an issue.<br />
<br />
== Useful things to know ==<br />
=== Subversion commands ===<br />
svn help add<br />
<br />
svn help commit<br />
<br />
svn help log<br />
<br />
Adding files to repositories requires you to set some properties to the files and to have a [http://apps.sourceforge.net/trac/sitedocs/wiki/Subversion sourceforge account]. See <code>svn help propset</code>. You can use the <code>propget</code> on existing files to see how you should add it. A convenient way is to common files to your <code>~/.subversion/config</code> file, eg in my config I have:<br />
<br />
[miscellany]<br />
enable-auto-props = yes<br />
<br />
[auto-props]<br />
*.py = svn:eol-style=native;svn:mime-type=text/plain;svn:keywords=Author Date Id Revision<br />
*.po = svn:eol-style=native;svn:mime-type=text/plain;svn:keywords=Author Date Id Revision<br />
*.sh = svn:eol-style=native;svn:executable<br />
Makefile = svn:eol-style=native<br />
*.png = svn:mime-type=application/octet-stream<br />
*.svg = svn:eol-style=native;svn:mime-type=text/plain<br />
<br />
=== svn2cl ===<br />
The Gramps project does not keep a ChangeLog file under source control. All change history is captured by Subversion automatically when it is committed. A ChangeLog file is generated from the SVN commit logs before each release using [[How to use svn2cl|svn2cl]]. Developers should take care to make useful commit log messages when committing changes to Subversion. Here are some guidelines:<br />
<br />
*Try to make a descriptive message about the change.<br />
*Use complete sentences when possible.<br />
*When committing a change that fixes a bug on the tracker, use the bug's number and summary as the message.<br />
*When committing a patch from a contributor, put the contributor's name and e-mail address in the commit message.<br />
*It is not necessary to put the names of the files you have modified in the commit message because Subversion stores that automatically.<br />
<br />
=== Other usage tips ===<br />
<br />
* Additional tips and recommendations related to committing changes: [[SVN Commit Tips]]<br />
* GRAMPS [[Committing Policies]]<br />
<br />
=== Browse svn ===<br />
<br />
An alternative to the command line tools to view the svn repository is<br />
the [http://gramps.svn.sourceforge.net/viewvc/gramps/ online interface].<br />
<br />
[[Category:Developers/General|B]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15190GEPS 010: Relational Backend2009-03-27T03:33:57Z<p>AaronS: /* Power vs Dependencies */ answer</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for adding a sql backend =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# An SQL version of a GRAMPS DB should be faster<br />
# An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.<br />
# SQL Engines can perform query optimizations<br />
# More code would reside in the db, rather than in Python<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite <br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= Questions and concerns =<br />
<br />
== Native DB access for other languages ==<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Power vs Dependencies ==<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent. --[[User:AaronS|AaronS]] 19:16, 26 March 2009 (UTC)<br />
<br />
Actually, I was wondering about OS platforms, but that info is useful, too. --[[User:Dsblank|Dsblank]] 00:40, 27 March 2009 (UTC)<br />
<br />
I only researched python orms so they should work where ever python does. I confirmed sqlalchemy probably does since it supports MS-SQL and MSAccess --[[User:AaronS|AaronS]] 03:33, 27 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
# finish research and pick a database.<br />
# finish research and pick a database abstraction layer.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15187GEPS 010: Relational Backend2009-03-26T19:16:10Z<p>AaronS: /* Power vs Dependencies */</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for adding a sql backend =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# An SQL version of a GRAMPS DB should be faster<br />
# An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.<br />
# SQL Engines can perform query optimizations<br />
# More code would reside in the db, rather than in Python<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite <br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= Questions and concerns =<br />
<br />
== Native DB access for other languages ==<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Power vs Dependencies ==<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent. --[[User:AaronS|AaronS]] 19:16, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
# finish research and pick a database.<br />
# finish research and pick a database abstraction layer.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15186GEPS 010: Relational Backend2009-03-26T19:15:40Z<p>AaronS: /* Power vs Dependencies */ answering questions</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for adding a sql backend =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# An SQL version of a GRAMPS DB should be faster<br />
# An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.<br />
# SQL Engines can perform query optimizations<br />
# More code would reside in the db, rather than in Python<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite <br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= Questions and concerns =<br />
<br />
== Native DB access for other languages ==<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Power vs Dependencies ==<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
I'm not certain if we need the orm layer or not. One option would be to just use sqlAlchemy's SQL Expression Language layer for abstraction and forgo the orm. sqlAlchemy's [http://www.sqlalchemy.org/ home] and [http://www.sqlalchemy.org/docs/05/intro.html intro] might help explain some things. Another is that what ever we do we will be using a DB-API so perhaps we should test how far that takes us first before adding yet another dependency. Is the ORM available for all platforms? if your asking about programming languages no sqlAlchemy is a python module. if your asking about databases the [http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis support databases list] is decent.<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
# finish research and pick a database.<br />
# finish research and pick a database abstraction layer.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15185Talk:GEPS 010: Relational Backend2009-03-26T19:07:46Z<p>AaronS: /* Recomendations */</p>
<hr />
<div>I tried to keep these in order and make a little more sense out of them but it's still a bit rough. --[[User:AaronS|AaronS]] 18:33, 26 March 2009 (UTC)<br />
<br />
=Databases=<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. -- ?<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
<br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving it another try depending on how hard it is to implement this. Yes I know it will be hard but probably much easier and productive than starting my own project. I'm a developer my self and when it came time to evaluate gramps the lack of a relational db backend was one of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable version of MySQL which might overcome some of sqlites advantages. If a database abstraction layer is used both could be easily supported. They both have their advantages and disadvantages.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
<br />
=Database Abstraction=<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies. --?<br />
<br />
<br />
What you looking for here is called a Database Abstraction Layer they are indeed quite powerful and are exactly what you need. if your going to bother switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know that in php every web framework seems to have their own. I suspect the same for python. Django has their own but allows for the use of others (if that tells you anything). might be a place to check for alternatives. While their framework might be for websites that shouldn't matter for the DB Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and mysql are musts, you may even find one that can talk to BSDDB but probably not. Oracle and PostgreSQL are pluses but will probably never be used but who knows what will happen in 5 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db open source charging for service. stranger things have happened. ease of use, readablity and outer joins are also important. don't worry too much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict. ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for the complex stuff a sort of localization object (or even file) is a good bet with named queries. this would work similar to how different languages are usually supported in projects. with a different object or file per db. I'd recommend an actual object with a function per query over a file of constants/variables since some db's might require a little more manipulation than others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization object. This db localization object isn't used for all queries because you only want to have to tweak the minimum amount of queries across dbs --Aarons<br />
<br />
= Recomendations =<br />
== AaronS ==<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
'''Amendment''' What ever abstraction we use it would appear that they almost all sit on top of a DB-API module. [http://www.sqlalchemy.org/docs/05/intro.html sqlAlchemy's] intro does a good explanation of the layers. Therefore we will at least be using one of those. One option would be to start coding around a couple of DB-API modules first, one for each database (sqlite and mysql) and see how the abstraction would work for each case. If we start the testing development side by side than we will be able to see how interchangeable they will be. Note that there are multiple DB-API modules available for each database. There appear to be leaders but their might be choices for a reason. If we pick an ORM then the choice will be made for us but if we don't than a choice will have to be made. Personally I'm more familiar with a model (mvc) approach where each table or basic construct has an object that handles the db io but isn't actually trying to be that table. I only suggested sqlAlchamey because that appears to be what the current consensuses is for abstraction. --[[User:AaronS|AaronS]] 19:07, 26 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15184Relational database comparison2009-03-26T18:43:06Z<p>AaronS: removing sql discussion</p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
=== Transportable Trees ===<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
*there may be some limits on size mostly due to ram and rom limts. [http://www.sqlite.org/limits.html Limits In SQLite] Since the whole db doesn't need to be loaded on reads and rights these limits should be far larger than for bsddb.<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on. (sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able. maybe not even that difficult. it should just take some planning.<br />
<br />
=MySQL Embeded=<br />
Needs to be looked at, may be more powerful than sqlite but easier for end users than full mysql<br />
<br />
= Comparing BSDDB to SQLite =<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15183Talk:GEPS 010: Relational Backend2009-03-26T18:42:53Z<p>AaronS: /* Databases */ moving sql discussion</p>
<hr />
<div>I tried to keep these in order and make a little more sense out of them but it's still a bit rough. --[[User:AaronS|AaronS]] 18:33, 26 March 2009 (UTC)<br />
<br />
=Databases=<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. -- ?<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
<br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving it another try depending on how hard it is to implement this. Yes I know it will be hard but probably much easier and productive than starting my own project. I'm a developer my self and when it came time to evaluate gramps the lack of a relational db backend was one of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable version of MySQL which might overcome some of sqlites advantages. If a database abstraction layer is used both could be easily supported. They both have their advantages and disadvantages.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
<br />
=Database Abstraction=<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies. --?<br />
<br />
<br />
What you looking for here is called a Database Abstraction Layer they are indeed quite powerful and are exactly what you need. if your going to bother switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know that in php every web framework seems to have their own. I suspect the same for python. Django has their own but allows for the use of others (if that tells you anything). might be a place to check for alternatives. While their framework might be for websites that shouldn't matter for the DB Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and mysql are musts, you may even find one that can talk to BSDDB but probably not. Oracle and PostgreSQL are pluses but will probably never be used but who knows what will happen in 5 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db open source charging for service. stranger things have happened. ease of use, readablity and outer joins are also important. don't worry too much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict. ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for the complex stuff a sort of localization object (or even file) is a good bet with named queries. this would work similar to how different languages are usually supported in projects. with a different object or file per db. I'd recommend an actual object with a function per query over a file of constants/variables since some db's might require a little more manipulation than others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization object. This db localization object isn't used for all queries because you only want to have to tweak the minimum amount of queries across dbs --Aarons<br />
<br />
= Recomendations =<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15182Relational database comparison2009-03-26T18:37:04Z<p>AaronS: /* Disadvantages */ added sqlite limits</p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
=== Transportable Trees ===<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
*there may be some limits on size mostly due to ram and rom limts. [http://www.sqlite.org/limits.html Limits In SQLite] Since the whole db doesn't need to be loaded on reads and rights these limits should be far larger than for bsddb.<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
=MySQL Embeded=<br />
Needs to be looked at, may be more powerful than sqlite but easier for end users than full mysql<br />
<br />
= Comparing BSDDB to SQLite =<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre></div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15181Talk:GEPS 010: Relational Backend2009-03-26T18:33:21Z<p>AaronS: refactoring</p>
<hr />
<div>I tried to keep these in order and make a little more sense out of them but it's still a bit rough. --[[User:AaronS|AaronS]] 18:33, 26 March 2009 (UTC)<br />
<br />
=Databases=<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. -- ?<br />
<br />
<br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving it another try depending on how hard it is to implement this. Yes I know it will be hard but probably much easier and productive than starting my own project. I'm a developer my self and when it came time to evaluate gramps the lack of a relational db backend was one of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable version of MySQL which might overcome some of sqlites advantages. If a database abstraction layer is used both could be easily supported. They both have their advantages and disadvantages.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
<br />
<br />
=Database Abstraction=<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies. --?<br />
<br />
<br />
What you looking for here is called a Database Abstraction Layer they are indeed quite powerful and are exactly what you need. if your going to bother switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know that in php every web framework seems to have their own. I suspect the same for python. Django has their own but allows for the use of others (if that tells you anything). might be a place to check for alternatives. While their framework might be for websites that shouldn't matter for the DB Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and mysql are musts, you may even find one that can talk to BSDDB but probably not. Oracle and PostgreSQL are pluses but will probably never be used but who knows what will happen in 5 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db open source charging for service. stranger things have happened. ease of use, readablity and outer joins are also important. don't worry too much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict. ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for the complex stuff a sort of localization object (or even file) is a good bet with named queries. this would work similar to how different languages are usually supported in projects. with a different object or file per db. I'd recommend an actual object with a function per query over a file of constants/variables since some db's might require a little more manipulation than others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization object. This db localization object isn't used for all queries because you only want to have to tweak the minimum amount of queries across dbs --Aarons<br />
<br />
= Recomendations =<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15180Relational database comparison2009-03-26T18:26:05Z<p>AaronS: </p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
=== Transportable Trees ===<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
=MySQL Embeded=<br />
Needs to be looked at, may be more powerful than sqlite but easier for end users than full mysql<br />
<br />
= Comparing BSDDB to SQLite =<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre></div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15179GEPS 010: Relational Backend2009-03-26T18:21:35Z<p>AaronS: adding and refactoring questions and concerns</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for adding a sql backend =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# An SQL version of a GRAMPS DB should be faster<br />
# An SQLite version of a GRAMPS DB should be smaller than a BSDDB file.<br />
# SQL Engines can perform query optimizations<br />
# More code would reside in the db, rather than in Python<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite <br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= Questions and concerns =<br />
<br />
== Native DB access for other languages ==<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Power vs Dependencies ==<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
# finish research and pick a database.<br />
# finish research and pick a database abstraction layer.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15178GEPS 010: Relational Backend2009-03-26T18:09:45Z<p>AaronS: moving recomendations to discussion</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
# finish research and pick a database.<br />
# finish research and pick a database abstraction layer.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15177Talk:GEPS 010: Relational Backend2009-03-26T18:09:36Z<p>AaronS: adding recomendations</p>
<hr />
<div>I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15176GEPS 010: Relational Backend2009-03-26T18:01:53Z<p>AaronS: more minor edits moving some discussion</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
= See Also =<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15175Talk:GEPS 010: Relational Backend2009-03-26T18:01:42Z<p>AaronS: adding old discusion</p>
<hr />
<div><br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre></div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15174GEPS 010: Relational Backend2009-03-26T17:58:46Z<p>AaronS: heading cleanup</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15173GEPS 010: Relational Backend2009-03-26T17:57:01Z<p>AaronS: moving bssdb comparion to db comparison page</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15172Relational database comparison2009-03-26T17:56:26Z<p>AaronS: adding bsddb sqlite comparison</p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
=== Transportable Trees ===<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
= Comparing BSDDB to SQLite =<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre></div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15171GEPS 010: Relational Backend2009-03-26T17:52:25Z<p>AaronS: /* Transportable Trees */ moving to db comparison</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
<br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15170Relational database comparison2009-03-26T17:51:57Z<p>AaronS: /* Advantages */ adding transportable trees</p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
=== Transportable Trees ===<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15169GEPS 010: Relational Backend2009-03-26T17:49:33Z<p>AaronS: Begining to move around reasons</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
<br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:Relational_database_comparison&diff=15168Talk:Relational database comparison2009-03-26T17:41:51Z<p>AaronS: New page: It might make sense to keep this discussion in one place for now. Talk:GEPS 010: SQL Backend</p>
<hr />
<div>It might make sense to keep this discussion in one place for now.<br />
<br />
[[Talk:GEPS 010: SQL Backend]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:Database_abstraction_layers_comparison&diff=15167Talk:Database abstraction layers comparison2009-03-26T17:41:41Z<p>AaronS: </p>
<hr />
<div>It might make sense to keep this discussion in one place for now.<br />
<br />
[[Talk:GEPS 010: SQL Backend]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15166Relational database comparison2009-03-26T17:40:54Z<p>AaronS: </p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with [[GEPS 010: SQL Backend]]. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15165Relational database comparison2009-03-26T17:40:39Z<p>AaronS: </p>
<hr />
<div>This page if for a comparison of different database, and is specific for how they might be used for GRAMPS. It was started to help with GEPS 010: SQL Backend. <br />
<br />
=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Relational_database_comparison&diff=15164Relational database comparison2009-03-26T17:39:08Z<p>AaronS: New page: =SQLite= ==Advantages== *far easier to setup. just start writing to the file! no connection or user accounts. *smaller install (code) size. *easier for users to manage / and share sepperat...</p>
<hr />
<div>=SQLite=<br />
==Advantages==<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
==Disadvantages==<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
<br />
=MySQL=<br />
==Advantages==<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
==Disadvantages==<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15163GEPS 010: Relational Backend2009-03-26T17:38:54Z<p>AaronS: starting to move out database comparison</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15162GEPS 010: Relational Backend2009-03-26T17:28:32Z<p>AaronS: /* Discussion */ removing to discussion page</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
<br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:Database_abstraction_layers_comparison&diff=15161Talk:Database abstraction layers comparison2009-03-26T17:26:40Z<p>AaronS: New page: It might make more sense to keep this discussion in one place. Talk:GEPS 010: SQL Backend</p>
<hr />
<div>It might make more sense to keep this discussion in one place.<br />
<br />
[[Talk:GEPS 010: SQL Backend]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Talk:GEPS_010:_Relational_Backend&diff=15160Talk:GEPS 010: Relational Backend2009-03-26T17:25:29Z<p>AaronS: moving discussion to the discussion page</p>
<hr />
<div>=== Discussion ===<br />
<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre></div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15159GEPS 010: Relational Backend2009-03-26T17:23:27Z<p>AaronS: removal of dbal comparison</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= Research =<br />
[[Relational database comparison]]<br />
<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
<br />
<br />
=== Discussion ===<br />
<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Database_abstraction_layers_comparison&diff=15158Database abstraction layers comparison2009-03-26T17:22:52Z<p>AaronS: started page</p>
<hr />
<div>This page if for a comparison of python database abstraction layers. it was started to help with [[GEPS 010: SQL Backend]]. and is specific for how they might be used for GRAMPS.<br />
<br />
This question was asked on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
== <strike> CouchDB </strike> ==<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
== DB-API ==<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
== <strike> Django </strike> ==<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
==<strike> pydo </strike>==<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
== SQLAlchemy ==<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
== SQLObject ==<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
== Storm ==<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15157GEPS 010: Relational Backend2009-03-26T17:17:09Z<p>AaronS: finishing removal of exporter adding new pages</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel).<br />
<br />
= Reasons for making the switch =<br />
= Research =<br />
[[Relational database comparison]]<br />
[[Database abstraction layers comparison]]<br />
<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Discussion ===<br />
<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
== See Also ==<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Addon:SQLite_Export_Import&diff=15156Addon:SQLite Export Import2009-03-26T17:11:31Z<p>AaronS: added back link</p>
<hr />
<div>An exporter is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
== See also ==<br />
[[GEPS 010: SQL Backend]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=Addon:SQLite_Export_Import&diff=15155Addon:SQLite Export Import2009-03-26T17:10:41Z<p>AaronS: New page: An exporter is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py] You can export m...</p>
<hr />
<div>An exporter is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15154GEPS 010: Relational Backend2009-03-26T17:08:36Z<p>AaronS: begining some refactoring of the page. it was starting to get unweildy an mediawiki was warning about length</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/export/ExportSql.py?view=markup trunk/src/plugins/export/ExportSql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Discussion ===<br />
<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
=== Power vs Dependencies ===<br />
<br />
Do we want to have an additional layer over the Database Abstraction Layer (eg, an ORM)? <br />
<br />
PROS:<br />
<br />
# Makes GRAMPS code more abstract<br />
<br />
CONS:<br />
<br />
# Makes it harder for other languages to use the native GRAMPS db (but they can use the native db)<br />
# Adds a dependency <br />
<br />
Given that GRAMPS's developers have, in the past, written their own db transactions, and their own HTML abstractions, does it make sense to add such a dependency?<br />
<br />
Is the ORM available for all platforms?<br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''. This goes without saying. For this to succeed, the developers should agree with all of the major decisions.<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
## Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
## User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
I don't understand the use of '''slots''' in the above. How is that idea related to db access? --[[User:Dsblank|Dsblank]] 11:14, 26 March 2009 (UTC)<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
=== See Also ===<br />
[[ExportSql.py]]<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15148GEPS 010: Relational Backend2009-03-26T04:51:27Z<p>AaronS: /* Additional Issues */</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! And of course there is always the option of just using an orm and building similar objects in the new language. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15147GEPS 010: Relational Backend2009-03-26T03:55:43Z<p>AaronS: /* Additional Issues */ more notes</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. Not that I'm saying we should use it but a quick google search started to bring up things like this [http://pecl.php.net/package/python php python package]. so there may be some hope for even using the orm layer but how complex would we really want to make it! --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15146GEPS 010: Relational Backend2009-03-26T03:30:49Z<p>AaronS: /* Additional Issues */</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL as an alternative backend. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Additional Issues ==<br />
<br />
If we use a well-known SQL backend, we should consider the ability for other languages to be able to natively access the database. For example, a PHP program should be able to use the same database. Does using a Python-based ORM tie the data to Python? Or can the database still be used natively from other systems?<br />
<br />
Using a Python based ORM wont tie the data just to python. any language should be able to access the db just fine. However, they wouldn't have access to pythons orm layer. Since I haven't used a true orm before I'm not certain exactly how it will effect our table relationships but I don't believe they wont make some sense in a relational way. --[[User:AaronS|AaronS]] 03:30, 26 March 2009 (UTC)<br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx article]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15142GEPS 010: Relational Backend2009-03-26T00:52:06Z<p>AaronS: /* Database Abstraction Layer */ recomendations</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: yes [http://www.sqlalchemy.org/ source]<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than sqlalchemy.<br />
<br />
=== Recomendations ===<br />
Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.<br />
<br />
I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite. <br />
<br />
sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.<br />
<br />
sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way. <br />
<br />
Reasons I don't recommend the other options include:<br />
<br />
MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.<br />
<br />
DB-API. with [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.<br />
<br />
SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.<br />
<br />
Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.<br />
<br />
Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspxarticle] he does make some valid points. There is a reason that true object databases [http://en.wikipedia.org/wiki/Object_oriented_database haven't caught on]. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." [http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspxarticle]<br />
<br />
--[[User:AaronS|AaronS]] 00:52, 26 March 2009 (UTC)<br />
<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15141GEPS 010: Relational Backend2009-03-25T22:24:38Z<p>AaronS: /* Database Abstraction Layer */ storm notes</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: maybe, need to confirm.<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Storm ===<br />
https://storm.canonical.com/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
*ORM (Object Relational Mapper): yes<br />
*mulitple database support?: yes<br />
<br />
*Viability: currently developed. seems like a fairly good site with some documentation. The bigest drawback is that the project is only a year old. a pro is that it may be easier to use than alchemy.<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15140GEPS 010: Relational Backend2009-03-25T21:15:57Z<p>AaronS: /* DB-API */ more notes</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
"While all DB-API modules have identical APIs (or very similar; not all backends support all features), if you are writing the SQL yourself, you will probably be writing SQL in a product-specific dialect, so they are not as interchangeable in practice as they are in theory." [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python kquinn]<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: maybe, need to confirm.<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15139GEPS 010: Relational Backend2009-03-25T21:12:57Z<p>AaronS: /* Database Abstraction Layer */ DB-API notes</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has an API to make it easy to move from one SQL-based DB to another called DB-API. Each DB may have multiple different modules available for it. If we settle on this solution then we should do some quick searches to make sure we pick the right modules.<br />
<br />
*MySQL: Yes [http://sourceforge.net/projects/mysql-python MySQLDB] used by SQLAlchemy<br />
*SQLite: Yes [http://www.python.org/doc/2.5.2/lib/module-sqlite3.html sqlite3] (included in Python 2.5 or greater) [http://oss.itsystementwicklung.de/trac/pysqlite pysqlite] both used by SQLAlchemy<br />
*BSDDB: No. The DB-API looks to be only for relational dbs.<br />
<br />
*ORM (Object Relational Mapper): no<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. <br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: maybe, need to confirm.<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15138GEPS 010: Relational Backend2009-03-25T20:34:23Z<p>AaronS: /* Database Abstraction Layer */ typo correction</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has a API to make it easy to move from one SQL-based DB to another called DB-API:<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. could this be added since it uses a DB-API? I'm guessing it would be a lot of work but might be worth exploring.<br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: maybe need to confirm.<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronShttps://www.gramps-project.org/wiki/index.php?title=GEPS_010:_Relational_Backend&diff=15137GEPS 010: Relational Backend2009-03-25T20:32:34Z<p>AaronS: /* Database Abstraction Layer */ more notes</p>
<hr />
<div>This page is for the discussion of a proposed implementation of a SQL backend for GRAMPS. <br />
<br />
A proposed implementation is being developed in [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/plugins/Sql.py?view=markup trunk/src/plugins/Sql.py] You can export most of GRAMPS through an SQL Export using the Export Assistant. (Currently, the selection options are ignored, and it will output everything).<br />
<br />
SQL stands for "Structured Query Language" and is pronounced "sequel" (it is a joke: as it came after QUEL, it is its sequel). After you export your GRAMPS data into a file such as ''Untitled_1.sql'' using the above Exporter, then you can use SQL queries like:<br />
<br />
<pre><br />
$ sqlite3 Untitled_1.sql<br />
SQLite version 3.5.9<br />
Enter ".help" for instructions<br />
<br />
sqlite> .tables<br />
dates family names people repository<br />
events media notes places sources <br />
<br />
sqlite> .headers on<br />
.headers on<br />
<br />
sqlite> select * from people;<br />
handle|gramps_id|gender|death_ref_index|birth_ref_index|change|marker0|marker1|private<br />
b247d7186567ff472ef|I0000|1|-1|-1|1225135132|-1||0<br />
<br />
sqlite> select * from names where surname like "%Smith%";<br />
private|first_name|surname|suffix|title|name_type0|name_type1|prefix|patronymic|group_as|sort_as|display_as|call<br />
0|Test|Smith|||2|||||0|0|<br />
<br />
sqlite> .exit<br />
$<br />
</pre><br />
<br />
The current database in GRAMPS would require that you write some code to do this, and you'd need to know some details about the data.<br />
<br />
= SQL Backend =<br />
<br />
Currently, GRAMPS uses a BSD database as its internal file format. While this is considerably better than, say, an XML format, the choice of the BSD-DB has a considerable number of drawbacks. This proposal explores the use of SQL a replacement. This should allow easy, single db file implementations (eg, SQLite) to more complex and sophisticated client/server (eg, MySQL).<br />
<br />
First, there are a number of facts related to this proposal:<br />
<br />
# BSDDB is being removed from the standard distribution of Python (as of Python 2.6)<br />
# SQLITE is being added to the standard Python distribution<br />
# BSDDB is not a relational database, but a hierarchical one<br />
# BSDDB databases do not have schema or data-definitions. BSDDB requires all of the database structure logic to reside in code<br />
# BSDDB is a programmer's API<br />
# SQL is a declarative, independent abstraction layer<br />
# SQL can optimize queries (in low-level C) whereas BSDDB is done in Python<br />
# SQLite tables of a database reside in a single file<br />
<br />
Next, are a number of claims that need to be tested:<br />
<br />
# An SQLite version of a GRAMPS BSDDB may be 4 times smaller<br />
# An SQLite version of a GRAMPS BSDDB may be faster<br />
## The files may be smaller<br />
## The smaller files may allow more into memory<br />
## More code would reside in C, rather than in Python<br />
## SQL Engines can perform query optimizations<br />
# Enterprise SQL versions of GRAMPS would allow people to create and manage much larger trees<br />
# An SQLite version of GRAMPS might allow people to create larger trees<br />
## Because we move all of the DB logic into SQL, we can focus on making GRAMPS stable with large databases<br />
# SQL code is simpler than the equivalent BSDDB code, because SQL is declarative/abstract and BSDDB is a low-level API<br />
<br />
Further implications:<br />
<br />
# A fullscale MySQL backend would be a trivial step from SQLite (although maybe harder to setup and maintain; although see Django)<br />
# Easy to allow multiple users in a SQLite database (uses file-locking)<br />
# There is a lot of code that we have written dealing with BSDDB. It would have to all be rewritten in SQL (on the other hand, a lot of code can be deleted, which will make GRAMPS easier to maintain and adapt)<br />
# We will have to develop SQL experts<br />
<br />
<pre><br />
It's good to see this discussion on gramps and is actually why I'm thinking of giving<br />
it another try depending on how hard it is to implement this. Yes I know it will be hard<br />
but probably much easier and productive than starting my own project. I'm a developer my<br />
self and when it came time to evaluate gramps the lack of a relational db backend was one<br />
of the main reasons I decided to keep looking.<br />
<br />
Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable<br />
version of MySQL which might overcome some of sqlites advantages. If a database abstraction<br />
layer is used both could be<br />
easily supported. They both have their advantages and disadvantages.<br />
<br />
MySQL<br />
Advantages<br />
*far better tools for management and reporting<br />
*a true enterprise level database capable of handling serious loads<br />
*far more is built into the db. ie auto incrementing fields, stored procedures and on and on.<br />
(sqlite may not even have triggers but I can't remember)<br />
*far more extensive user base and support.<br />
<br />
Disadvantages<br />
*install size (bloat)<br />
*an actual server to setup run and maintain.<br />
** there are tools that can do this automatically though and make things almost none<br />
existent for an end user. also the embeddable mysql might be an option.<br />
*may be difficult to manage / share multiple databases. more difficult but very do able.<br />
maybe not even that difficult. it would just take some planning.<br />
<br />
SQLite<br />
Advantages<br />
*far easier to setup. just start writing to the file! no connection or user accounts.<br />
*smaller install (code) size.<br />
*easier for users to manage / and share sepperate db's<br />
*single file<br />
*good support.<br />
<br />
Disadvantage<br />
*while great for what it is it's not an enterprise level database<br />
*many "traditional" relational db things are lacking.<br />
*while tools exist they aren't as fleshed out and solid as the mysql ones.<br />
<br />
Personally I think SQLite makes more sense for genealogical software. but mysqls<br />
tools and the fact that it's a "real" enterprise level relational db are serious advantages.<br />
-- AaronS<br />
</pre><br />
<br />
== Transportable Trees ==<br />
<br />
From http://www.sqlite.org/onefile.html:<br />
<br />
'''Single-file Cross-platform Database'''<br />
<br />
''A database in SQLite is a single disk file. Furthermore, the file format is cross-platform. A database that is created on one machine can be copied and used on a different machine with a different architecture. SQLite databases are portable across 32-bit and 64-bit machines and between big-endian and little-endian architectures.''<br />
<br />
''The SQLite database file format is also stable. All releases of of SQLite version 3 can read and write database files created by the very first SQLite 3 release (version 3.0.0) going back to 2004-06-18. This is "backwards compatibility". The developers promise to maintain backwards compatibility of the database file format for all future releases of SQLite 3. "Forwards compatiblity" means that older releases of SQLite can also read and write databases created by newer releases. SQLite is usually, but not completely forwards compatible.''<br />
<br />
''The stability of the SQLite database file format and the fact that the file format is cross-platform combine to make SQLite database files an excellent choice as an Application File Format.''<br />
<br />
<pre><br />
The Single disk file of sqlite db would be a major selling point for sqlite<br />
for genealogy software since users share and compare db's all the time.<br />
--Aarons<br />
</pre><br />
<br />
== Database Abstraction Layer ==<br />
<br />
I asked this question on [http://stackoverflow.com/questions/679806/what-are-the-viable-database-abstraction-layers-for-python StackOverflow].<br />
<br />
=== <strike> CouchDB </strike> ===<br />
http://code.google.com/p/couchdb-python/<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
This is not a db abstraction layer and not even a relational db.<br />
<br />
=== DB-API ===<br />
http://wiki.python.org/moin/DatabaseProgramming/<br />
<br />
Python has a API to make it easy to move from one SQL-based DB to another called DB-API:<br />
<br />
=== <strike> Django </strike> ===<br />
http://www.djangoproject.com/<br />
<br />
Django also provides DB independence, but is geared towards web deployment:<br />
<br />
"Django developed its ORM (and template language) from scratch. While that may have been a pragmatic decision at the time, Python now has SQLAlchemy, a superior database layer that has gained a lot of momentum. '''Django’s in-house ORM lacks multiple database support''', and forces constraints on your database models (e.g. that every database table must have a single, integer primary key). If you choose Django, your project gains a near-inseparable dependency on Django’s ORM and database requirements." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
Django's DB abstraction probably isn't a good fit. While powerful I doubt any projects are using it outside of Django.<br />
<br />
===<strike> pydo </strike>===<br />
[http://skunkweb.sourceforge.net/pydo.html pydo]<br />
<br />
doesn't look viable and looks like limited users and documentation. tied to a webframework.<br />
<br />
=== SQLAlchemy ===<br />
http://www.sqlalchemy.org/<br />
<br />
*MySQL: Yes<br />
*SQLite: Yes<br />
*BSDDB: No. could this be added since it uses a DB-API? I'm guessing it would be a lot of work but might be worth exploring.<br />
<br />
[http://www.sqlalchemy.org/docs/05/dbengine.html#supported-dbapis supported dbs]<br />
<br />
*ORM (Object Relational Mapper): yes but doesn't force it.<br />
*mulitple database support?: maybe need to confirm.<br />
<br />
*Viability: Last release was January 24, 2009. They seem to have an established development team and user base. Project appears to be 3 years old,<br />
<br />
"SQLAlchemy is designed to operate with a DB-API implementation built for a particular database" [http://www.sqlalchemy.org/docs/05/intro.html#api-reference source]<br />
<br />
"SQLAlchemy, widely considered to be the best Python ORM available. SQLAlchemy includes multiple database support and just about any crazy combination of database requirements needed, and it handles ORM very well — yet it also allows you to provide raw SQL as needed." [http://marcuscavanaugh.com/blog/python-web-framework-advice/ marcus cavanaugh 2009]<br />
<br />
=== SQLObject ===<br />
http://www.sqlobject.org/<br />
<br />
*MySQL: yes<br />
*SQLite: yes<br />
*BSDDB: no<br />
<br />
[http://www.sqlobject.org/SQLObject.html#requirements requirements]<br />
<br />
*ORM (Object Relational Mapper): yes this is what sqlobject is all about.<br />
*mulitple database support?: ?<br />
<br />
*Viability: Last release was 2008-12-08, 7 developers on the project. couldn't find old release dates but first but was posted 2003-04-09. link to wiki is broken which isn't the best sign but we've all had those times before...<br />
<br />
[http://www.sqlobject.org/SQLObject.html#compared-to-other-database-wrappers comparison]<br />
<br />
[http://pylonshq.com/ Pylons] took the time to be able to use it.<br />
<br />
=== Discusion ===<br />
I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. <br />
<br />
It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.<br />
<br />
<pre><br />
What you looking for here is called a Database Abstraction Layer they are<br />
indeed quite powerful and are exactly what you need. if your going to bother<br />
switching the back end don't waste your time and not use one. you'll kick<br />
yourself later if you don't. just be careful which one you choose. I know<br />
that in php every web framework seems to have their own. I suspect the same<br />
for python. Django has their own but allows for the use of others (if that<br />
tells you anything). might be a place to check for alternatives. While<br />
their framework might be for websites that shouldn't matter for the DB<br />
Abstraction layer.<br />
<br />
What to look for in a db Abstraction Layer is which dbs it can use. sqlite and<br />
mysql are musts, you may even find one that can talk to BSDDB but probably not.<br />
Oracle and PostgreSQL<br />
are pluses but will probably never be used but who knows what will happen in 5<br />
or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db<br />
open source charging for service. stranger things have happened.<br />
ease of use, readablity and outer joins are also important. don't worry too<br />
much about how complex of sql queries its supposed to allow you to create since<br />
complex queries through a db layer tend to be difficult to create, read and predict.<br />
ie sub queries and the like. usually those queries are far easier to just build as a query.<br />
<br />
in my experience a db abstraction layer is good for most of the db io. however, for<br />
the complex stuff a sort of localization object (or even file) is a good bet with named<br />
queries. this would work similar to how different languages are usually supported in<br />
projects. with a different object or file per db. I'd recommend an actual object with a<br />
function per query over a<br />
file of constants/variables since some db's might require a little more manipulation than<br />
others. again this would only be for the most complex queries. a good rule of thumb would<br />
be if you had to start writing parts of the query as strings move it to the db localization<br />
object. This db localization object isn't used for all queries because you only want to have<br />
to tweak the minimum amount of queries across dbs<br />
--Aarons<br />
</pre><br />
<br />
== Discussions of BSDDB in Python ==<br />
<br />
BSDDB has had a hard time in Python. Python Developers have been wrestling with trying to keep it stable. Guido finally decided to separate BSDDB from the standard Python Distribution. See discussions:<br />
<br />
* http://jessenoller.com/2008/09/04/stirred-up-dem-bees-should-bsddb-be-removed-from-python/<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081357.html<br />
* http://mail.python.org/pipermail/python-dev/2008-July/081426.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082197.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082203.html<br />
* http://mail.python.org/pipermail/python-dev/2008-September/082244.html<br />
<br />
PEP 3108 marks BSDDB to be removed:<br />
http://www.python.org/dev/peps/pep-3108/<br />
Development is not death however, it will only be out of sync of the python cycle. The home of pybsdb offering the bsddb3 package is hereL http://www.jcea.es/programacion/pybsddb.htm<br />
<br />
A sqlite shelve interface for Python:<br />
http://bugs.python.org/issue3783<br />
<br />
From http://www.sqlite.org/cvstrac/wiki?p=SqliteCompetitors:<br />
SQLite versus Berkeley DB:<br />
<br />
<pre><br />
Berkeley DB (BDB) is just the data storage layer - it does not<br />
support SQL or schemas. In spite of this, BDB is twice the size<br />
of SQLite. A comparison between BDB and SQLite is similar to a<br />
comparison between assembly language and a dynamic language like<br />
Python or Tcl. BDB is probably much faster if you code it<br />
carefully. But it is much more difficult to use and considerably<br />
less flexible.<br />
<br />
On the other hand BDB has very fine grained locking (although<br />
it's not very well documented), while SQLite currently has only<br />
database-level locking. -- fine grain locking is important for<br />
enterprise database engines, but much less so for embedded<br />
databases. In SQLite, a writer gets a lock, does an update, and<br />
releases the lock all in a few milliseconds. Other readers have<br />
to wait a few milliseconds to access the database, but is that<br />
really ever a serious problem?<br />
</pre><br />
<br />
== Comparing from BSDDB to SQLite ==<br />
<br />
A company that justifies a switch from BSDDB to SQLite; see http://www.tribler.org/DatabaseMigration:<br />
<br />
Oracle's description of BSDDB; see http://www.oracle.com/database/docs/Berkeley-DB-v-Relational.pdf. Excerpt:<br />
<br />
<pre><br />
Berkeley DB Offers APIs, not Query Languages <br />
<br />
Berkeley DB was designed for software developers, by software<br />
developers. Relational database systems generally provide SQL access<br />
to the data that they manage, and usually offer some SQL abstraction,<br />
like ODBC or JDBC, for use in applications.<br />
</pre><br />
<br />
What BSDDB is not:<br />
<br />
http://pybsddb.sourceforge.net/ref/intro/dbisnot.html<br />
<br />
From previous GRAMPS discussions:<br />
<br />
http://mlblog.osdir.com/genealogy.gramps.devel/2005-02/msg00092.shtml&ei=2MYxSanZNaCgesqz0KQB&usg=AFQjCNG1l3yKZ4YP_L7Yo0cQ8bqWmoJKTQ&sig2=H8x1qf4YrFYlsLFlJUsZ-w<br />
<br />
From the GRAMPS archives:<br />
<pre><br />
> Now, sometimes we get a request for a major architectural change that we<br />
> will accept. A good example is the new database backend for the upcoming<br />
> GRAMPS 2.0. The request came in to support a real database backend so we<br />
> could support larger databases. We analyzed the request, and felt that<br />
> it matched the goals of the project and would provide a significant step<br />
> forward in the usability of the program. The result was a major redesign<br />
> effort that will soon be released.<br />
<br />
I think I and few others are the ones that impacted this decision. Having an <br />
850,000 person database tends to be deadly to the XML architecture that we <br />
were with. I've been the main person to test the integrity of the system <br />
with my Gedcom file importing. When I found that I couldn't import my file <br />
without extensive data loss, I came to Don and Alex and we all sought for <br />
solutions. We found that the XML interface was taking huge amounts of <br />
memory, and we looked for database backends that would handle the load. Don <br />
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been <br />
happy as a clam with the Gramps project, because I'm one step closer to <br />
killing Windows.<br />
<br />
I personally want to do away with it, but I need it for other applications. <br />
I've also come to the realization that both Windows and Linux are good, but <br />
in their own realms. I don't want this to become a huge flame war about <br />
Linux and Windows. so if you have other questions as to why I feel this way, <br />
email me.<br />
<br />
> So, would we accept a mySQL database backend? There is a good chance we<br />
> would (depending on the implementation), as long did not impact Aunt<br />
> Martha. We have even architected the backend to support this, since we<br />
> can see that higher end databases could provide additional functionality<br />
> such as versioning and multiuser support.<br />
<br />
We could accept mySQL because of this, but I agree with Don. If it negatively <br />
impacts the end user, why would we want to proceed with it? I have a friend <br />
that wondered about mySQL interaction, but he can see the impact that BSDDB <br />
has had on my database, and he has sided with me as well as the rest of the <br />
team. Not to say that this is not a possibility, but we need to remain <br />
focused on the tasks at hand.<br />
<br />
> So, in summary, the project is going in a direction that seems to meet<br />
> the needs of our users. If we changed directions, we might or might not<br />
> be able to reach a larger audience, but numbers are not our goal. We<br />
> fully support others submitting patches and other contributions, but<br />
> they will be weighed on how they match the goals of the project (and<br />
> most of the patches we've received to date do match the goals). If<br />
> someone wants us take the project in a different direction, we may or<br />
> may not be receptive depending if the direction matches our goals.<br />
> However, we will support your efforts if you decide to fork the project.<br />
> Who knows, maybe a remerge will occur in the future, or a forked project<br />
> will make us irrelevent.<br />
<br />
I agree with Don on this, numbers don't matter as long as the users are happy. <br />
Getting things appropriately nailed down and ready for the end user's use is <br />
what is paramount. After all, if there were no users, why would we even have <br />
a project with which to collaborate in the first place?<br />
<br />
We are here for the users, especially Aunt Martha, because of the fact that <br />
many people are just moving over to Linux and having something familiar to <br />
them, like a genealogical program is what matters to them. Making the <br />
transition to Linux is hard, don't get me wrong. But we are making it one <br />
step easier by not complicating the user's experience in their move.<br />
<br />
Like I said before, I'm just a bug finder. I'm not really a Python <br />
programmer, or anything, but I like to find bugs. Even if that's all I do on <br />
this project, I'm rather content. Everyone else that wants to port over to <br />
other toolkits and whatnot is free to do so.<br />
<br />
But also as an end user that's still a greenie to Linux in general, I can say <br />
that this program has helped my move over to Linux that much easier. Even if <br />
I have only contributed a little in the way of feedback (mostly from the <br />
end-user perspective).<br />
<br />
-Jason<br />
</pre><br />
<br />
From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:<br />
<br />
<pre><br />
Alex said:<br />
<br />
SQLite might be better or it might not, we haven't tried it. A great factor<br />
speaking for BSDDB is that it is supported by a standard Python module,<br />
bsddb. <br />
<br />
<br />
Don said:<br />
<br />
This is an important factor here - ease of setup and use. GRAMPS is<br />
difficult enough to get installed on some platforms (especially<br />
KDE-centric systems). Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort. What I've<br />
discovered is that GRAMPS is one of the first programs that a lot of<br />
new users want to get running - usually before they have a lot of<br />
Linux experience. So we can't make the barriers to entry too high.<br />
</pre><br />
<br />
<pre><br />
"Requiring someone to get an SQL database up and<br />
running to try out the program is probably too much effort." This simply isn't true of sqlite<br />
at all. The program would simply write to the db file. No server setup, no user accounts, no<br />
connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL<br />
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe<br />
it's possible to use scripts and or code to manage launching and stoping the server. It might<br />
be possible to make it seamless for the user but would depend on the implementation.<br />
--AaronS<br />
</pre><br />
<br />
= What now? =<br />
== Create Object model== <br />
<br />
Going over src/gen/lib/, create an object model of how GRAMPS uses and manipulates genealogy data.<br />
<br />
'''For this GEP to succeed it is extremely important that the experienced developers on the devel list agree with the object model'''<br />
<br />
== Select an SQL framework==<br />
<br />
# finish research and pick a database abstraction layer.<br />
# finish research and pick a database.<br />
<br />
== Create models/tables ==<br />
<br />
# use the framework to set up a model of the database<br />
# generate the tables<br />
# create a dump of bsddb database in the sql database<br />
# validate that all things present in bsddb are present in the sql database<br />
# check validation rules. Eg, handle should be unique, rules must be added to ensure adding to the family table an object with handle like a person object is '''impossible''' on the database layer. These kind of rules can be done technically (a primary object table with key on handle) or with rules.<br />
# best would be a framework that based on the model can generate an admin module to browse the database, see eg the admin module in django.<br />
<br />
== New db backend for GRAMPS ==<br />
<br />
# write an implementation of [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/base.py src/gen/db/base.py] to interface the DB abstraction layer with the rest of gramps. Gramps 3.x only has one implementation: [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/gen/db/dbdir.py src/gen/db/dbdir.py], but in branch22 a gedcom and a gramps xml implementation can be found (these have been deprecated).<br />
# once written, this can be added as an experimental backend to GRAMPS<br />
:# Family Tree manager needs to list the family tree type (bsddb, sqlite), on creation of new family tree user must choose the backend.<br />
:# User can import .gramps/gedcom files just as this is done with bsddb backend once family tree is set up.<br />
# it will be very important to use slots in src/gen/lib to make this work. Obtaining a person via get_person_from_handle, should only hit the person table. Only when the calling method needs attributes, should the attribute table be hit. This requires attributes that are not yet defined up to the moment they are accessed. It also means that the gen/lib objects for sql need to be aware of the database as it needs to know where to obtain these values... . This looks like a huge work to me, but definitely doable. Just rewriting gen/lib for an sql datamodel might be easier though, but that means rewriting the core of GRAMPS....<br />
<br />
== Extending base.py ==<br />
<br />
Once an sql backend is stable, base.py can be extended to offer extra functionality, or better optimize for SQL. Eg, in SQL one would have probably an attribute table. To know which persons have a specific attribute, SQL would select that from the attributes table, and then look up the people. In bsddb it means however to loop over all persons, and obtain the attribute sub table of a person and looking if attribute is present there. <br />
<br />
Above clearly indicates that how one goes about in the two backends is very different. The bsddb way will work in sql though (as the get_person method works, and speed should be comparable to bsddb if above deferred obtaining of values via slots is implemented). Nevertheless, a clear mechanism to optimize for sql is needed. Continuing above example, see [http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/src/Filters/Rules/_HasAttributeBase.py _HasAttributeBase.py]<br />
<br />
For sql, one would use the prepare method, obtain all people in a list, then return True or False if person is in this list. As db is passed, db can have a support_sql attribute, and code can be written depending on this setting. This does not look very ideal though.<br />
<br />
<br />
[[Category:GEPS|S]]</div>AaronS