Talk:GEPS 010: Relational Backend

From Gramps
Revision as of 18:09, 26 March 2009 by AaronS (talk | contribs) (adding recomendations)
Jump to: navigation, search

I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends.

It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies.

It's good to see this discussion on gramps and is actually why I'm thinking of giving
it another try depending on how hard it is to implement this. Yes I know it will be hard
but probably much easier and productive than starting my own project. I'm  a developer my
self and when it came time to evaluate gramps the lack of a relational db backend was one
of the main reasons I decided to keep looking.

Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable
 version of MySQL which might overcome some of sqlites advantages. If a database abstraction
 layer is used both could be
 easily supported. They both have their advantages and disadvantages.

Personally I think SQLite makes more sense for genealogical software. but mysqls
tools and the fact that it's a "real" enterprise level relational db are serious advantages.
-- AaronS

What you looking for here is called a Database Abstraction Layer they are
indeed quite powerful and are exactly what you need. if your going to bother
switching the back end don't waste your time and not use one. you'll kick
yourself later if you don't. just be careful which one you choose. I know
that in php every web framework seems to have their own. I suspect the same
for python. Django has their own but allows for the use of others (if that
 tells you anything). might be a place to check for alternatives. While
 their framework might be for websites that shouldn't matter for the DB
 Abstraction layer.

What to look for in a db Abstraction Layer is which dbs it can use. sqlite and
 mysql are musts, you may even find one that can talk to BSDDB but probably not.
 Oracle and PostgreSQL
 are pluses but will probably never be used but who knows what will happen in 5
 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db
open source charging for service. stranger things have happened.
ease of use, readablity and outer joins are also important. don't worry too
much about how complex of sql queries its supposed to allow you to create since
complex queries through a db layer tend to be difficult to create, read and predict.
ie sub queries and the like. usually those queries are far easier to just build as a query.

in my experience a db abstraction layer is good for most of the db io. however, for
the complex stuff a sort of localization object (or even file) is a good bet with named
queries. this would work similar to how different languages are usually supported in
projects. with a different object or file per db. I'd recommend an actual object with a
function per query over a
file of constants/variables since some db's might require a little more manipulation than
others. again this would only be for the most complex queries. a good rule of thumb would
be if you had to start writing parts of the query as strings move it to the db localization
 object. This db localization object isn't used for all queries because you only want to have
to tweak the minimum amount of queries across dbs


Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.

I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite.

sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.

sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way.

Reasons I don't recommend the other options include:

MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.

DB-API. with sqlite3 It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.

SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.

Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.

Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this article he does make some valid points. There is a reason that true object databases haven't caught on. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." article

--AaronS 00:52, 26 March 2009 (UTC)