Talk:GEPS 010: Relational Backend

From Gramps
Revision as of 18:33, 26 March 2009 by AaronS (talk | contribs) (refactoring)
Jump to: navigation, search

I tried to keep these in order and make a little more sense out of them but it's still a bit rough. --AaronS 18:33, 26 March 2009 (UTC)

Databases

I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. -- ?


It's good to see this discussion on gramps and is actually why I'm thinking of giving it another try depending on how hard it is to implement this. Yes I know it will be hard but probably much easier and productive than starting my own project. I'm a developer my self and when it came time to evaluate gramps the lack of a relational db backend was one of the main reasons I decided to keep looking.

Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable version of MySQL which might overcome some of sqlites advantages. If a database abstraction layer is used both could be easily supported. They both have their advantages and disadvantages.

Personally I think SQLite makes more sense for genealogical software. but mysqls tools and the fact that it's a "real" enterprise level relational db are serious advantages. -- AaronS


Database Abstraction

It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies. --?


What you looking for here is called a Database Abstraction Layer they are indeed quite powerful and are exactly what you need. if your going to bother switching the back end don't waste your time and not use one. you'll kick yourself later if you don't. just be careful which one you choose. I know that in php every web framework seems to have their own. I suspect the same for python. Django has their own but allows for the use of others (if that tells you anything). might be a place to check for alternatives. While their framework might be for websites that shouldn't matter for the DB Abstraction layer.

What to look for in a db Abstraction Layer is which dbs it can use. sqlite and mysql are musts, you may even find one that can talk to BSDDB but probably not. Oracle and PostgreSQL are pluses but will probably never be used but who knows what will happen in 5 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db open source charging for service. stranger things have happened. ease of use, readablity and outer joins are also important. don't worry too much about how complex of sql queries its supposed to allow you to create since complex queries through a db layer tend to be difficult to create, read and predict. ie sub queries and the like. usually those queries are far easier to just build as a query.

in my experience a db abstraction layer is good for most of the db io. however, for the complex stuff a sort of localization object (or even file) is a good bet with named queries. this would work similar to how different languages are usually supported in projects. with a different object or file per db. I'd recommend an actual object with a function per query over a file of constants/variables since some db's might require a little more manipulation than others. again this would only be for the most complex queries. a good rule of thumb would be if you had to start writing parts of the query as strings move it to the db localization object. This db localization object isn't used for all queries because you only want to have to tweak the minimum amount of queries across dbs --Aarons

Recomendations

Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.

I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite.

sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.

sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way.

Reasons I don't recommend the other options include:

MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.

DB-API. with sqlite3 It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.

SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.

Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.

Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this article he does make some valid points. There is a reason that true object databases haven't caught on. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." article

--AaronS 00:52, 26 March 2009 (UTC)