Changes

Jump to: navigation, search

GEPS 017: Flexible gen.lib Interface

6,297 bytes added, 13:51, 12 January 2010
no edit summary
This proposal would use an alternative gen.lib construction, that avoids these problems.
 
As further evidence that there are problems with the current approach in Gramps, it suffices to look at src/gui/views/treemodels. Eg for eventmodel.py, we have:
 
def column_date(self,data):
if data[COLUMN_DATE]:
event = gen.lib.Event()
event.unserialize(data)
return DateHandler.get_date(event)
return u''
 
In this code, data was obtained as raw data from the database:
data = db.get_raw_event_data(handle)
The model needs to store the position in the data of the date storage, COLUMN_DATE. This couples the database table with the view implementation. Only when present is an event object created, so the overhead of making an event can no longer be avoided. This however is a very costly operation as now the entire event is initialized, also eg EventType(), NoteBase(), ....
All this to only obtain the date contained in the event object.
= Possible Fixes =
# Use an Engine inside each object to retrieve data when necessary
== # Replicate ==:Replicating gen.lib has the benefit of having zero impact on the current gen.lib. However it would require two separate code paths to maintain, and does nothing to address unnecessary unpickling in BSDDB. It also means gramps-connect and gramps proper will have no real code to share
Replicating gen# Lazy Wrapper :The lazy wrapper idea was shown to have some savings in postponing unserializing (see patch in bug report [2]).lib has However, the benefit of having zero impact on the current gen.lib. However it would require two separate code paths requirement to maintainwrap all data in lazy(), and does nothing to address unnecessary unpickling in BSDDBthe unintended side-effects were too great a cost.
== Lazy Wrapper ==# Explicit delayed unpickling:Just save the data of substructure until you need to unserialize it. This is still based on pickling and is limiting in future approaches.
The lazy wrapper idea was shown to have some savings in postponing unserializing (see patch in bug report [2]). However, the requirement to wrap all data in lazy(), and the unintended side-effects were too great a cost.# Engine
== Explicit delayed unpickling ==The best choice considered so far is to build an invisible engine into the gen.lib framework.
Just save This proposal would use an alternative gen.lib construction, that avoids the problems listed in the data of substructure until you need to unserialize introduction. We will detail it. This is still based on pickling and is limiting in future approachesbelow.
== A gen.lib Engine ===== Introduction ===This proposal would use an alternative gen.lib construction, that avoids the problems listed in the introduction.
The best choice considered so far core concept is to build that when using gen.lib on a database, an invisible engine into Engine object must be created, which will contain the methods needed to map database data to object attributes. All objects will have access to this Engine via a factory method. Furthermore, all compound gen.lib frameworkobjects will understand the concept of delayedaccess. That is, the object is not fully initialized on init. When not yet initialized pieces are needed (like eg the medialist of a person object), the object first initializes this piece, then returns it.
This proposal It would use an alternative gen.lib construction, that avoids the problems listed in the introduction. This means it should provide:
# init of objects in one single call.
:* So Person() provides an empty Person object
:* Person(data) initializes an object, where data is the data about Person in the db which can be interpreted by an Engine objectto set the attributes:* Person(source=pers) remains possible, to duplicate an existing object* When using gen.lib on a database, one must set an Engine that gen.lib should use. The engine knows how data is present in the database, and what fields in the objects correspond to this
# objects only set attributes that have no processing overhead at init. Other attributes are set only when they are needed, at which time they are further unpacked or fetched from db, via the engine.
# unserialize/serialize are removed as methods of an object, and are moved to the engine # get and set methods are remove, and replaced by attribute access and the property method to do the delayed access as needed# gen.lib will obtain two engines to start with. One for bsddb, and one for a django backend. ## BsddbEngine will be pure software. The engine will contain all present serialize/unserialize methods present now in the objects themselve.## DjangoEngine will have a pointer to the django models. When eg a person objects needs access to it's media_list, the DelayedObj will call the DjangoEngine to obtain the media list, which will use the sql mediareference table to return the list of all MediaRef data === Suggested Implementation ======= No serialize/unserialize ====Objects have no serialize/unserialize anymore. This is present in the engine of a database that needs it, and only there. So in practice, the bsddb engine. Example usage code on bsddb  def get_person_from_handle(self, handle) return Person(db.get_raw_person_data(handle)) The person class will call the bssdb engine from the factory to unserialize this data. Engine will be stored to avoid calling factory every time. So obj.__engine will store the engine, and obj.engine make it accessible. This is part of the DelayedAccess object API, of which all gen.lib objects will inherit. To store data:  def commit_person(self, person, ...) .... db_data = person.engine.person_serialize() ... This works because engine is a bsddb engine, and hence the person_serialize method exists. ==== DelayedAccess ====All gen.lib objects know the concept of delayed access, using an engine to obtain the not yet initialized data.   class DelayAccessObj(object): """ An object that supports delayed access of the data. gen.lib objects are large constructs. Depending on the storage backend one can create objects of which part of the data is not yet retrieved or constructed for performance reasons. On access of these parts, the data must be obtained or constructed. The DelayAccessObj provides the infrastructure to obtain this. It holds: 1. an engine which is used to obtain the missing data. """ def __init__(self): self._engine = EngineKeeper.get_instance().engine Note that above should be done with properties, so that _engine is only obtained when requested and still None. Note also that all gen.lib obects should perhaps use __slots__ to reduce memory footprint. When not yet initialized attributes are needed, the engine is requested for the data. For example the marker attribute of a person, which is a MarkerType() object. Eg, the code fragment  pers = db.get_person_from_handle(handle) print pers.marker This initializes a Person. In the new setup, Person has it simple attributes set, and the rest is handle by delayedaccess. In essense, this means that pers.private is already set True or False in the __init__ of Person, but pers.marker is a property. Simplified, we have a setup as:  def __init__(self, data): DelayedAccess.__init__(self) (self.private, self.__marker, self.__media_list) = self._engine.unpack_person(data) For bssdb, we will have eg: self.private = False, self.__marker = 1, self.__media_list the raw tupled mediareference data For django, with mediaref in another table: self.private = False, self.__marker = 1, self.__media_list = ('Person', handle) The aim should be clear, each engine unpacks the data passed in a way that allows delayed access of the attribute. The bsddb engine, uses only the typle data passed by the database table. The django engine however, sets media_list to the value needed to obtain a media_list from the media reference table. Next, pers.marker is called:  @property def marker(self): if not isinstance(self._marker, MarkerType): #delayed retrieval of marker from the engine using the key self._marker = self._engine.get_markertype(self._marker) return self._marker   @property def media_list(self): if not isinstance(self._media_list, list): #delayed retrieval of media list from the engine using the key self._marker = self._engine.get_medialist(self._media_list) return self._media_list So, as _marker is not initialized, the engine is used to obtain the marker from the data. Same for _media_list. Note that media_list returns a list of MediaRef objects, which however will use themselves delayed access to further unpack themselves as needed, so a minimal overhead has happened.
= References =

Navigation menu