Jump to: navigation, search

Meaningful filenames

12,082 bytes added, 10 February
See also: add reference
After thinking Article about naming files in a meaningful way. Naturally files should have unique names so we don't end up with several files with the limits to same or very similar names. This article takes file naming one step further by looking at how we the file name itself can structure our files and folder (see [[Portable_Filenames]]) carry useful information about the next step is developing a semantic controlled vocabularyfile.
Before launching too deep into = Why meaningful filenames = If all names are kept unique why also try to embed meaning in the file name itself? Here are some approaches to managing information about information which which you might recognise:* File names and directory hierarchies can help describe the contents. By placing a picture called ''second birthday.jpg'' in a directory called ''My son James'' we have stored data (the picture relates to James' second birthday) about the data (the picture of the birthday party).* Data about data (called ''metadata'', ([ Wikipedia's ''Metadata'' entry]) can also be stored inside the file it describes, for example:** HTML, the language of webpages, uses tags like ''<span style="normalText">Example</span>''. Here the meta data describes the style of the text, ie: ''Example'' is ''normalText''** EXIF ([ Wikipedia's ''EXIF'' entry]) is a way of storing meta data in image files, like when the photo was taken and what type of camera was used. * Database systems (Gramps is a database system for genealogy) can store a huge amount of data about data. They are very efficient at this lets look at job and very powerful.** Google Search uses a database to remember what web pages are about, and tells you when you ask. So why not use one of those options?  * EXIF is great, but only for some types of files (not supported in JPEG 2000, PNG, or GIF), there are lots of different systems for different types of files. People are working hard to improve this situation all the time. * HTML is great if you can store all your information as HTML files, but HTML files cannot contain other files, they just point to them. So we want to achieve'd basically end up making a website about our files.* Understandable A database, well we already use this when we use Gramps. The Gramps database stores lots of information about the files and records it records. But Gramps does not store the actual file inside the database. If the connection between Gramps and the data it is describing is broken, then the files are just files. They contain no more information than they did when you first ''imported'' them into Gramps. This system of ''meaningful filenames'' has the following aims:* Computer readable filenamesPreserving enough metadata to give the file's content context without Gramps* Creating file names normal people can understand so they can see what the file is about without Gramps* Creating file names which a computer can process easily so files need to be batch processed and metadata can be read directly from the file name without possible confusion* A Creating a system simple enough to rememberuse all the time for every file
To be understandable we need to be able to use full words where appropriate.
To be computer readable we need to seperate separate the parts in a way which a script can easily recognise and, more importantly, in a way which would never occur in real language. So it would be no good to mark a ''name'' section with the word ''name'' if we also can use the word name somewhere in the file where it is not meant to be a marker.
To be simple enough to remember the system should not be too complicated, after all GRAMPS Gramps is meant to store the real information, this is just a supplement. == What's in a name? ==
== What name-parts do we need? ==
It would be nice if we could have files called
Marriage of Mary Angus Jones and Matthew Williams, 2nd Dec 1923 (William Angus is to Mary's right).jpg
But this meets only one of the criteria above, that of ''understandable filenames''. How can a computer know who got married? what their surnames are? and so on. And anyway because of the limitations of [[Portable_Filenames]] ''Portable Filenames'' we can't have file names like that. We have to drop the reliance on capitalisation, drop the spaces, drop the comma and drop the brackets. To be computer readable we need to separate the sections with a system of markers to indicate where the surname, event name etc are.
So what sections do we want to be able to identify? Here's a basic list that should be enough for most situation, remember that GRAMPS Gramps stores the more complex information, we're just trying to give a useful structure to our files.
* Surname
* Firstname
* Source
* Note
Some more important criteria. All file names:
* Must be unique
* Must have all necessary information
* Must have no more information than necessary
So if I find a file somewhere strange in my system, or if someone I sent a file to seven years ago says "that file you sent me - that's not Jean it's her daughter" I know where my archive copy of that file will be.
= GEDCOM based =
== Source events ==
The GEDCOM 5.5 standard defines so few events as to be useless. The GRAMPS Gramps XML schema defines no events as these can be made by the user. This all seems fair enough since events are highly culture based. The situations where I think a set of events should be defined are those which will be connected with source records. GEDCOM has a reasonable group of those but they are heavily based in western christian culture. The solution must be language and culture dependent. Here's my list:
'''marriage''' is for an actual marriage event and all the associated documentation, including possible divorce and separation documentation.
'''health''' is for health records
== Examples == === An event image file ===
File name:
This could be parsed (by GRAMPSGramps?) as the description:
'''Event:''' Marriage
Mary-jean Jones and Matthew Williams, marriage 2nd Jan 1923. (William angus to the right of mary)
=== A source image =file ==
File name:
This could be parsed (by GRAMPSGramps?) as the description:
'''Source:''' Census
Census, Place: Uk, england, london, on 21st March 1840. This is a source connected to Mary-jean Jones
=== A source text ===
File name:
"The Jones Family from 1735" by Mary Jean Jones, 1872
= GRAMPS ID based = SWOT analysis ==This Over at Wikipedia there is another attempt by a good explanation of a [[Userhttp:Duncan|Duncan Lithgow]] to find a good system//enGRAMPS ID's use the first character to denote the type of item the ID refers towikipedia. This could be converted to work in filenamesorg/wiki/SWOT_analysis SWOT analysis]. {| border="1"{{prettytable}}
! MarkerAspect! DescriptionStrengths! GRAMPS ID equivalentWeaknesses! Opportunities! Threats
| P--File length| placeHolds a lot of information| PAll the information is already in the genealogy software|-Easily recognised. Easy to search for files with a certain [[Tag]]| I--| individual| I|-| F--| family| F|-| E--| event| E|-| S--| source| S|-| O--| media object| O|-| R--| repository| R|-| N--| note| N?
Extending this idea = Gramps ID based = {{man note|This is another attempt by [[User:Duncan|Duncan Lithgow]] to find a good system. It is not finished so feel free to add comments and correct any obvious mistakes.}} Here's the records we'll use as examples. They involve Mary Agnes Williams (daughter of John Williams and Anna Matthews). She married Anders Sørensen (son of Anders Sørensen and Anna ?) and they had a daughter Anna Sorensen, note the spelling change. * Census record: mentioning her and her siblings and parents. It is from the 1810 census in the London parish of Dangerfield on Saint John Road.* Portrait: a bit with hand drawn portrait of Mary, undated, assumed to be from before her marriage.* House picture: her parent's Saint John's Road row house in London, from some time around 1810's* Court record: Anders Sørensen was before the district court for drunk and unbecoming behaviour on January 3rd, 1820. Engelfield, London.* Marriage certificate: She married Anders Sørensen, 2nd December 1823, in London.* Wedding portrait: in the picture is Anders Sørensen's father, also called Anders Sørensen (on the back it says that Anders Sørensen's (the son) mother is called Anna).* Birth certificate: of Anna Sorensen (daughter of Mary and Anders) dated January 18th, 1824* Family tree: a hand written family tree called "The Dean family from 1735" by an Angus Dean written in 1972 which connects the families Dean and Williams. == Justification == {{stub}} == Aims == This system tries to meet the following aims:* simple enough to remember* just enough information, and no more markers * all media for one family name is under one directory (portability for travel)* all media for firstname generating reports is under one directory (FNfor portability) == Record types == The record types tell us what the record is about. Gramps ID's use the first character to denote the type of item the ID refers to. Sticking to something already thought and taking the most relevant ones to stored records these can be used as the following tags for record types: * I-- Individual* P-- Place* E-- Event* S--Source  (see also source types)  '''Question'''* Records about repositories?* Correspondence with family?* What about records covering more than one type?* What will happen on old 8+3 file systems? == Record properties == Properties tell us just enough information to make the file name meaningful and recognisable, and split this information up so we can search for parts of it with our file manager. It's the what, where, when, why and how of what's in the record. By making all properties of each record compulsory we avoid extra tags like GN for given name and so on. We can see what a property is by where it is in the file name. * family name is their surname before marriage, but including deed pool changes, MacArthur for example* given name is their official first name* uid is a unique identity, in this example the (SNoriginal) Gramps ID of the media file* source date is the date in ISO 8601 format when the information left the people or organisation responsible for it* event date is the date in ISO 8601 format when the event occurred or started. YYYY-MM-DD, ie. 2008-12-28* event type is a noun describing the event, chosen from a list of event types, ie: marriage* title is the name of a document (book, letter, census) or object (gravestone, heirloom) , ie. williams__arthur_headstone* source author name is the name of the person or organisation most responsible for the information. For people always use family name first followed by two underscores (__), ie: church_of_lds* note is for notes. Names should always be family name first followed by a double underscore == Naming structure == Now we could get can outline a single schema for all record types in which the following rules apply. * File names are written directly to the file name, not copied from another program.* File names start with a single capital letter representing their record type.* Record properties are separated by two dashes (--). This can not be used for anything else.* Missing information is replaced by a single underscore (_).* Names in notes should always be family name first and separated by two underscores, ie: ''doe__john'' which can be represented as ''John Doe'' or ''Doe, John''.* Place names should start with the largest geographical region followed by a filename likedouble underscore before the next geographical region, ie: ''oz__far_far_away__yellow_brick_road'' which can be represented as ''Oz, Far far away, Yellow brick road''.* If the family name is unknown it must be replaced by an underscore. This will give three consecutive underscores (___), ie:''___john''' should always be interpreted as meaning ''[no record], John''.* event types should always be drawn from a list to avoid separate words being used for the same event type. (Maybe use the event list gramps uses?) # <record type>-- (I, P, E or S)# <source type/event type>-- (needs expansion.)# <1st persons family name[__2nd persons family name]>-- (two names for couples or families, alphabetical)# <1st persons given name(s)[__2nd persons given name(s)]>-- (two names for couples, same order as for family names)# <country code__region__city>-- (use as many divisions as needed)# <date>-- (ISO date, YYYY-MM-DD)# <note>-- (usually not needed)# <uid> (a Unique ID, possible derived from the gramps ID) Here's a version of the naming structure for quick reference.  ERecord_type--source_or_event_type--family_name__s--given_name__s--cc__city__place--YYYYMMDD--note--uid == Examples == Using the records outlined in the beginning we would get the following file names. (Please help complete this list of examples) * Census record: ''S--census--matthews__williams--anna__john--uk__london__dangerfield__st_johns_rd--1810-_-_--_-00874.pdf''* Court record: ''S--court_record--soerensen--anders--uk__london__engelfield--1820-01-03--before_district_court--00826.pdf''* Marriage certificate: ''S--marriage_certificate--jensen__williams--anders__mary_agnes--uk__london--1823-12-02--_--marriage_SN00864.pdf''* Portrait: ''I--jones_FNportrait--mary_angus_SNwilliams--williams_FNmary_agnes--matthew_DTuk__london--19231823-12-02_N03--wedding_portrait--william_angus_to_right_of_mary000967.jpg'' = Source based = One possible shortcoming of using an event and/or individual based naming strategy is that it could "clash" with the relation between a filename and the source. This only applies to filenames that are a representation of a specific part of a source. An example: having found the baptism record of Anna in page 51v of the 1843-1850 Baptism book of a certain Church we save the image (which displays pages 50v and 51f, i.e. it is an image of the "open book") and name it something like BAP--Anna--1850.png (just an example, any individual and role based mechanism will yield similar results). This works fine and allows one to easily extrapolate information from the file name. However, we latter find that in the exact same page, but a few paragraphs below, we have the baptism record of another individual, from another part of the family tree. While we can simply use the original file it wouldn't convey the right information. We can duplicate the file, but that doesn't make much sense, especially since when adding the file to the Source gallery we would end up with a duplicate, which makes little sense. One way to deal with this is to use a purely source-based approach in naming the files. The downside is that event and individual information can't be gleaned by looking at the file name - one would have to use Gramps itself to maintain the appropriate relations, which is after all something that is part of the source referencing work that should be done. On the other hand, and when talking about Sources that are books, it allows for easy grouping of content related to the same source, e.g. all the relevant pages on a certain book. An example filename would be "PBL--BAP3--F51-52.png", where PBL is the short name of the source author and BAP3 the short name of the specific source. Longer, more descriptive filenames could be used by using full names instead of codes. Obviously this method is limited in scope to some kinds of sources, and doesn't make sense for naming photos or documents that aren't part of a larger source (e.g. an ID card). A really good article on why and how to implement this kind of naming system is [ Hierarchical Sources] by Tony Proctor. = See also =* [[Organise your records]]* Gramps User maillist thread: [ Organizing media files] = External links =* [ File Naming Conventions for Digitally-stored Images]* [ Metadata] at Wikipedia - data about data* [ Meta tags] at Wikipedia* [ Ontology in computer science] from Wikipedia* [ Library science] from Wikipedia* [ File naming] at* [ File Naming / Organization Methods?] from [ What do I know?]
This can store the same information as in the GEDCOM based schema.[[Category:Documentation|F]][[Category:Developers/General|F]][[Category:Media]]

Navigation menu