Meaningful filenames
After thinking about the limits to how we can structure our files and folder (see Portable_Filenames) the next step is developing a semantic controlled vocabulary.
Before launching too deep into this lets look at what we want to achieve.
- Understandable filenames
- Computer readable filenames
- A system simple enough to remember
To be understandable we need to be able to use full words where appropriate.
To be computer readable we need to seperate the parts in a way which a script can easily recognise and, more importantly, in a way which would never occur in real language. So it would be no good to mark a name section with the word name if we also can use the word name somewhere in the file where it is not meant to be a marker.
To be simple enough to remember the system should not be too complicated, after all GRAMPS is meant to store the real information, this is just a supplement.
Contents
What name-parts do we need?
It would be nice if we could have files called
Marriage of Mary Angus Jones and Matthew Williams, 2nd Dec 1923 (William Angus is to Mary's right).jpg
But this meets only one of the criteria above, that of understandable filenames. How can a computer know who got married? what their surnames are? and so on. And anyway because of the limitations of Portable_Filenames we can't have file names like that. We have to drop the reliance on capitalisation, drop the spaces, drop the comma and drop the brackets. To be computer readable we need to separate the sections with a system of markers to indicate where the surname, event name etc are.
So what sections do we want to be able to identify? Here's a basic list that should be enough for most situation, remember that GRAMPS stores the more complex information, we're just trying to give a useful structure to our files.
- Surname
- Firstname
- Date
- Event type
- Place
- Source
- Note
GEDCOM based
This is a system contributed by Duncan Lithgow.
Tags
If we base a naming system on the 3 and 4 letter Lineage-Linked GEDCOM Tag Definition used in the GEDCOM 5.5 standard we have a good long list of tags to chose from. By limiting the GEDCOM tags list we can make the following shortlist (which does not include events):
AUTH-- Author "The name of the individual who created or compiled information." DATE-- Date EVEN-- Event "A noteworthy happening related to an individual, a group, or an organization." GIVN-- Given name "A given or earned name used for official identification of a person." NAME-- Name, use only if GIVN and SURN are not known "A word or combination of words used to help identify an individual, title, or other item. More than one NAME line should be used for people who were known by multiple names." NOTE-- Note "Additional information provided by the submitter for understanding the enclosing data." PLAC-- Place "A jurisdictional name to identify the place or location of an event." REFN-- Reference "A description or number used to identify an item for filing, storage, or other reference purposes." SOUR-- Source "The initial or original material from which information was obtained." SURN-- Surname "A family name passed on or used by members of a family." TITL-- Title "A description of a specific writing or other work, such as the title of a book when used in a source context, or a formal designation used by an individual in connection with positions of royalty or other social status, such as Grand Duke."
Each marker ends with two hyphens (--). Two because we can't rely on the marker being recognised as capitalised, so a surname like Besour-Jean could be mistaken for beSOUR-Jean and the system thinks that SOUR- marks a source section.
Punctuation
In order for the file name to be parsed as meaningful text I think some we also would need
_ Underscore to represent a space __ Double underscore to represent a comma followed by a space
Source events
The GEDCOM 5.5 standard defines so few events as to be useless. The GRAMPS XML schema defines no events as these can be made by the user. This all seems fair enough since events are highly culture based. The situations where I think a set of events should be defined are those which will be connected with source records. GEDCOM has a reasonable group of those but they are heavily based in western christian culture. The solution must be language and culture dependent. Here's my list:
marriage is for an actual marriage event and all the associated documentation, including possible divorce and separation documentation. birth is for the actual birth records, also christening record death is for death records census is for census records civic is for military service records, and government records of any type health is for health records
Examples
An image file
File name:
EVEN--marriage_SURN--jones_GIVN--mary-jean_SURN--williams_GIVN--matthew_DATE--1923-12-02_NOTE--william_angus_to_right_of_mary.jpg
This could be parsed (by GRAMPS?) as the description:
Event: Marriage Surname: Jones Given name: Mary-jean Surname: Williams Given name: Matthew Date: 2nd Jan, 1923 Note: William angus to the right of mary
or it could make the text:
Mary-jean Jones and Matthew Williams, marriage 2nd Jan 1923. (William angus to the right of mary)
A source image
File name:
SOUR--census_PLAC--uk__england__london_DATE--1840-03-21_SURN--jones_GIVN--mary-jean.pdf
This could be parsed (by GRAMPS?) as the description:
Source: Census Place: Uk, england, london Date: 21st March, 1840 Surname: Jones Given name: Mary-jean
or it could make the text:
Census, Place: Uk, england, london, on 21st March 1840. This is a source connected to Mary-jean Jones
A source text
File name:
SOUR--publication_TITL--the_jones_family_from_1735_AUTH--mary_jean_jones_DATE--1872.pdf
This could be read as the description:
Source: Publication Title: The Jones Family from 1735 Author: Mary Jean Jones Date: 1872
Or it could make the text:
"The Jones Family from 1735" by Mary Jean Jones, 1872
GRAMPS ID based
This is another attempt by Duncan Lithgow to find a good system.
GRAMPS ID's use the first character to denote the type of item the ID refers to. This could be converted to work in filenames.
Marker | Description | GRAMPS ID equivalent |
---|---|---|
P-- | place | P |
I-- | individual | I |
F-- | family | F |
E-- | event | E |
S-- | source | S |
O-- | media object | O |
R-- | repository | R |
N-- | note | N |
Extending this idea a bit with some more markers for firstname (FN--) and surname (SN--) we could get a filename like:
E--marriage_SN--jones_FN--mary_angus_SN--williams_FN--matthew_DT--1923-12-02_N--william_angus_to_right_of_mary.jpg
This can store the same information as in the GEDCOM based schema.