GEPS 018: Evidence style sources
Contents
Background
Many users, particularly if they aren't experienced researchers, may have difficulty abstracting the details from the wide variety of source types they encounter into the 4 fields that Gramps provides. Worse, the 4 fields aren't really adequate to capture all of the possible source information and redisplay it in well-formatted footnotes or endnotes in a report or reference links in a web page.
Elizabeth Shown Mills (1944- ) [broken linkhttp://www4.samford.edu/schools/ighr/faculty/mills_e.html
Internet Archive page] is an eminent American genealogist who has written extensively about collecting, analyzing, and citing evidence in genealogical research and publications, including the books "Evidence! Citation and Analysis for the Family Historian" and an expanded version, "Evidence Explained: Citing History Sources from Artifacts to Cyberspace". (The 1st edition was awarded the Library Journal 2007 Best Reference. A revised 3rd edition was released in 2018)
While most readers focus on the formats of the citations provided in the books, in reality every publisher has a style guide and Evidence Explained isn't used by any of them. The real value in these books is Mills's explanation of how to effectively analyze the evidence and how to integrate the many pieces of evidence (and Mills is well known for taking the "reasonably exhaustive search" requirement of the BCG's Genealogical Proof Standard to the absolute limit) into a well supported conclusion.
Citation styles are the concern of published material, and will differ both for the medium and for the publisher. So long as the necessary information of creator, title, enclosing work (for e.g. magazine or journal articles), publisher (if published) or repository (if not), date, and details (like page number) are available in the citation, the style isn't very important to the reader. Publishers want all of their publications to have a consistent style and issue style manuals to help authors prepare their work.
For a computer program like Gramps, the goals should be to collect all of the necessary information noted above in a way that is easy for users to enter, to support evidence analysis and comparison to create "proof arguments", and to link those proof arguments to the genealogical conclusions in the database.
Gramps's present data structure maps directly to the SOUR and SOUR_CITATION structures in the GEDCOM5.5 standard, and the source entry form maps directly to the data structure. While it's possible to cram everything needed for a good citation into those three fields, parsing the information back out to actually create a citation is unnecessarily challenging.
Bibliography Data Formats
- BibTeX has emerged as a common format (for interchange at least) among bibliography and reference management tools and offers a much richer set of available fields.
- The U.S. Library of Congress has published the Metadata Object Description Schema, an XML schema for encoding library catalog data. That wouldn't be very interesting except that BibUtils uses it as an intermediate format for converting between a variety of bibliography file standards.
- Zotero, Mendeley, and Papers use Citation Style Language (CSL), an XML schema, at least as an import/export medium. (Zotero uses a relational database for its actual storage.)
- Thompson-Reuters EndNote is easily the most popular commercial reference management program. It uses a proprietary file format which has nevertheless been reverse-engineered many times so that bibliographies can be easily exchanged between EndNotes and other reference managers.
- Most of the major commercial genealogy programs use a proprietary relational schema for storage of citation data. These fall into two broad categories, binary (similar to GRAMPS's key/value schema, where a citation is composed of several records each having a key/value pair and the program's logic parses the keys to display the citation in the desired format), single-table (where a database tuple is defined which contains the maximum needed fields, each of which is assigned a value according to a parsing scheme in the programs logic), and multiple-table, where different citation types are stored in tables with tuple schema which reflect the requirements of each. As so often in programming, each has costs and benefits with respect too.
Bibliography
Further Reading
Elizabeth Shown Mills' has a website that includes sample text pages from Evidence Explained, sample QuickCheck Models and a forum that includes exhaustively extensive discussions of citation issues.
John Yates has, with Mills's permission, encoded the elements of the specific examples in Evidence Explained: Two Computer Ready Parametrizations of "Evidence Style" Historical Sources.
A simpler template system is "Simple Citations", see the templates.
See also :
- Cultural Issues in Citations
- Evidence Style
- How to Cite Public Records
- The open citation (OpCit) linking project
- Better way to cite online sources
- Recollection project
- SourceTemplates.org
- Citation-style-language (CSL)
- A model for Germany
- French case and sourcing, annotating and citations.
- Standardizing Sources and Citation Templates, Louis Kessler’s Behold Blog
Citations and bibliography search engines:
- GenWeb using WIKINDX3.
- Library of Congress
- Online Computer Library Center API
- Gallica
- French national library
As an example of discussions of use of Evidence Explained we may consider citation for the UK Census. Anatomy of a Census Page provides quite a good general illustration of the use of Class, Piece, Folio and Page (though it shows only the 1891 census). UK Census Citations shows how this scheme applies to other censuses.
In a rootsweb mailing list ESN proposes the following reference:
1861 census of England, Middlesex, Shoreditch, Haggerstone East, Haggerstone St. Mary, folio 5, page 4, household 16, William Loe; PRO HO 9/249, The National Archives, Kew, Surrey, UK; via Family History microfilm 542,590; imaged in the database "1861 England Census," _Ancestry.com_ (www.ancestry.com : 2 August 2009).
There is a subsequent extensive discussion that may (or may not?) suggest changing (part of) the reference to
Office for National Statistics, London; Census - England & Wales - 1861; RG09/3753 Folio 12 Page 17, Schedule No. 86; Durham, Newbottle, District 6; The National Archives, Kew, Richmond, Surrey, TW9 4DU.
In a separate discussion in the Evidence Explained forum, 'Ann' asks:
EE page 303 shows an example for the 1841 census. This is my start for my citation; 1841 census of England, Warwickshire, [city], [parish], folio 6, lines 5-15, William Vero household; digital image Ancestry.com, (http://www.ancestry.com : accessed 24 September 2004); citing PRO HO 107/1127/10
'AdrianB' proposes:
1841 census of England, Warwickshire, Atherston, William Vero household; digital image Ancestry.com, (http://www.ancestry.com : accessed 24 September 2004); citing The National Archives of the UK, HO 107 piece 1127 book 10, folio 6, p. 4, lines 5-15
(because the folio and page are part of the TNA reference) which ESM calls workable but queries whether there is enough geographical information, while 'Ann' comes up with the following reference:
1841 Census of England, Warwickshire, Atherston Township, Hemlingford Hundred, Mancetter Parish, Enumeration District (ED) 1, folio 6, p. 4, lines 5-15, Long Street, William Vero household; digital image, Ancestry.com, (http://www.ancestry.com : accessed 24 September 2004); citing PRO HO 107/1127/10.
(line breaks inserted for easy comparison).
Needed
We need for this
- Fix of bug 2332:Allow reorder of Data in the Data tab of Source: make Data SourceAttribute, which have sourcetype [done](available in Gramps 4.1)
- Convert Computer Ready Parametrizations of "Evidence Style" Historical Sources[1] to format usable in Gramps, so sourcetypes, source attribute types, ..., and business logic Evidence (no templates needed, all business logic) [partially done]
- Adapt GUI to allow Evidence style sources input. Is a database change needed? Don't think so at the moment.
Storing the data
- Data is stored as SrcAttribute (key,value) pairs in Source and Citation.
- To decide:
- In Source, do we keep "Author" and "Pub Info" ? These can be stored also in Source Attributes, and be extracted from them to show in an overview. There is already a type AUTHOR. As Pub Info goes to GEDCOM, this could be type GEDCOM_PUB_INFO. If present it is used, otherwise it is generated
- In Citation, do we keep "Date" and "Volume/Page". Like for Source, all can be in the Citation Attributes. We can store which attributes typically are Dates, and allow a Date Editor input. Storage would be plain text though. Is this a problem somewhere in the code ??
If we decide "yes" to above, then source and citation objects must be changed, upgrade must be done.
Also:
- In Source, what to do with Title. This becomes like Description of Event, somewhat redundant. A field though in Gedcom. Ideally, if given used, otherwise generated from the source attributes.
- Abbreviation is for the storage in your _local_ archive, so as to allow easy retrieval. We need to make this clearer in the user interface.
GUI ideas
- Don't use a wizard (Nick)
- Benny:
- Instead of the tab 'General' for source and citation, we show the tab 'Overview', which would have only few fields editable that make sense, and then show concise the important things.
- For a new citation/source, user starts on a new 'Definition' (??) tab. Here he can give source type. Setting a source type, generates the fields needed as per the template definition. Note that some people have asked already for some other editors such a setup, with overview on not new objects with a nicer layout.
- I would like to enable some copy paste function though on the Definition tab. So, I would like to offer some mechanism to quickly copy paste or select existing parts of title/pub info (for users fixing imported gedcom or old gramps sources), and to import a bibtex and select fields from that. Perhaps a bottom part with buttons, or drag and drop to a top part with the actual fields? Need to try some GUI ideas for how to do this.
- In the definition tab, if entry is in a table, column for author, pubinfo can be added with checkbox to indicate what to use. Idea here is that we don't need our old Title, Author and Pub Info, but we do need to make clear to users what would be exported to Gedcom, as that is important. In Overview this could shown in a Gedcom section.
- Nick:
- have a "Preview" tab to show a preview of the Gedcom output as well as the F, S and L format
- can we not store dates as value that is date object for SrcAttribute?
In branch - testing of ideas
A Cited In tab shows all citations and where they are used:
From that tab, different citations can be loaded. The top part then becomes the citation editor. Not finished yet in following screenshot, citation template attributes still to be added.
A Template tab allows selection of templates, and generates fields needed. These fields are stored to the attributes as the user types. Following screenshot is not yet finished version. Default citation fields will be added, as well as short versions as needed.
Working in the GEP 18 branch. To experiment with.
Ideas:
- a Cited in tab, showing in treeview objects, secondaryobjects that use the source.
Old data of GEP
Entering source information
There are three broad alternatives for entering source data into a program, and GRAMPS should support all three:
- Form based: The traditional keyboard data entry method. The fields can be fixed or flexible: The former is easier on the developers, the latter more helpful to users, especially if they are inexperienced.
- Import: GRAMPS should be able to import (and export) source data from regular reference managers like Zotero, Pybliographer, and BibTeX. The Perl code in BibUtils could be adapted for to speed development.
- Parsing: It's becoming more common for the large reference websites to provide a ready-made citation on the webpage along with the data or image being presented. (Google Books even offers to download it as a BibTeX file, but that's unfortunately not yet common). It would be very helpful if the user could just paste this citation into a block and GRAMPS took care of parsing it into the appropriate database fields. Experienced users might find typing into the parsing text-entry to be a faster way of inputting source data than using form based input.
Further Discussion of Form-based Input
When the end user cites a source for information, they would be prompted with a window where they would select a main type and drill down through subtypes, as in the first few columns of the table presentation I've given. Once it is selected, the user will be prompted for the required (and perhaps optional) fields specific for that type of source reference.
The user would select the type of the source, and fill in the fields, for L (biblio list), F (full citation), and S (short citation) at citation time. The templates I've provided would be in pop up menus for the user to select.
- comment: popup is not very user friendly, better would be a wizard button on the source editor, this lets you define the source, asks for fields, and shows the automatic citation markup based on the templates at the bottom while user adds fields. On Save, all this data is saved in the attributes as needed. To investigate if a new field is needed on source editor.
- comment: The fields to input for the source are the same regardless of how the citation is formatted for output. It's the output template's job to select and order the fields, provide common abbreviations, and so on.
- comment: The formats that Mrs. Mills provides in Evidence Explained are examples, suitable for personal use. But she's published very widely as well as having edited a journal for many years and knows better than most that everyone has their own style. There should be a facility for customizing the output formats to suit the user or whoever is publishing the work.
Generating citation in reports
Then, when generating a report that contains citations, the mark up needs to be done on the fields according to the specifications in the table method or template method I've provided. (e.g. substitute the variables, italicize, embed with the proper punctuation, etc. Remove optional variables (and their punctuation) if the variable was not input. Remove privacy fields unless a privacy flag is turned on so that things like home addresses and phone numbers of people aren't put in reports unless you "force" it.
And the first time a citation is encountered in a report, use the Full version (F). The second and succeeding times use the Short (S) version. And when a bibliography is called for, use the L (List) template for that.
template definition
The templates would be stored in an internal database, as would the completed citations for storage and retrieval.
But, these would only be a (good) starting set. Part of the beauty of this parametrization is that the end user can use the language of the mark up in this table or template to define his own source style, punctuation, field quoted or italicized, etc. So in essence, any source output style can be accommodated, and is under full control of the end user. Evidence Style templates can be supplied as a starting set, not the only set. New Evidence Styles can be added, old ones deleted or modified, as the user wishes.
Proposed changes
User Interface
Instead of a one size fits all editor window, a template-controlled window (or wizard, but wizards get annoying for experienced users) would display a set of fields tailored to the selected source type.
A Multiline widget could be available for displaying the long-form citation as one filled in the fields; pasting or typing into the Multiline would be parsed and would change the field entries. This would speed entry for expert users and could be used by anyone to paste in citations provided by database websites like Ancestry.com.
Import/Export
Continued support of GEDCOM and GrampsXML is of course given. A BibTex, MODS, or Citation Style Language import/export of source data would allow easy interchange with bibliography managers like Zotero or EndNote.
Storage
Several approaches are available for storage of this more elaborate source data:
- Keep the existing GEDCOM-based arrangement, mapping the elements other than Author, Title, Id, and Abbreviation into a fixed-format string in Pub Info. This has the advantage of maintaining easy GEDCOM export, but packing and parsing the pub-info string carries a computational cost.
- Keep the existing GEDCOM-based arrangement, leaving the Pub Info string empty and storing everything beyond Author and Title in key-value tuples in a separate table.
- Change the database to accommodate a fixed number of fields, the data for which are controlled by a template description. Fields would be assigned by priority, with a certain number being common to all (e.g., Title and Creator). This model affords the easiest querying and fastest record assembly (many fewer joins), but wastes database space when not all fields are required for a source type.
- Change to a pure binary-relational data structure, where each source datum is stored in a named tuple and the structure and mapping of each format is controlled by a template table. This is on the one hand the most flexible and storage-efficient, but on the other greatly complicates querying, as multiple joins are required to construct a record.
- Some combination of the last two, where the most common fields (perhaps Source-Type, Title, Creator/Agency, Publisher/Periodical, Location/URI, and Date) are stored in a single record and the remaining are stored in a separate key/value table -- perhaps the same one as used for by the Data tab in the Source Editor window, though flagged so as not to appear there.
- Store a formatted string. This is space-efficient and eminently flexible, but requires formatting the string on storage and parsing it for any manipulation.
The present author favors a combination fixed table with mapped common fields and an auxiliary key/value table for additional elements which are less commonly used. An important consideration is the ability of users to fashion their own source types: Mrs. Mills has provided templates for those types she thinks researchers are most likely to encounter, but could not possibly have provided for every available source-type.
Additionally, some source-types require additional information which doesn't fit well into a database field format. One example is records in private hands: Mrs. Mills recommends including the provenance of the record showing that it indeed indicates the individuals claimed. Such a provenance is best recorded in a source note. She also stresses the importance of a detailed evaluation of the source by the researcher, and this evaluation is also best recorded in a source note.
Proposed Report changes
Reports use the new citation style, using templates to build the citation.
gramps-project:geps/gep-018-evidence-style implementation
In summer/autumn 2013, Benny implemented Evidence Explained template in Gramps. There was much controversy about this implementation.
As I (Tim Lyons a.k.a. kulath) recall I was particularly concerned about the added complexity of the EE templates, and wanted something that was more flexible. In particular, I wanted the user to be able to choose either the EE templates, or just the GEDCOM attributes, or any other templates that they wanted.
I therefore extended Benny’s implementation so that the templates were read in, and stored in the user’s Family Tree see github GEP-018. In the end, this was never used, and it seems that bitrot means that it does not properly run now, but I have captured some screenshots from it running on an old Linux system.
This allowed complete flexibility in citation attributes.
- Gramps could be run with no fixed attributes, or with just the standard GEDCOM ones, or with EE attributes, or with user defined attributes, or with any combination.
- The attributes were stored in the Family Tree database, so that if you merged trees, the appropriate attributes were available.
- Citations could be output (for reports etc.) using built-in code, and a proof of concept was implemented for output using citeproc.py.
- citproc.py implements CSL, and can output in a variety of different formats, for example Chicago manual of style etc.
- Attribute definitions could include mappings from the attributes to the fields needed by citeproc/CSL.
The EE templates are actually a bit of a crock; she has somewhat reinvented the wheel for various templates that are just arbitrarily different from other citation manuals. She says that “Evidence Explained guides us through a maze of sources not covered by other citation manuals”, that might have been true at the time she started her work, but she has never revised it to take account of more recent guidance from ‘other citation manuals’. However, the prototype I implemented allowed EE as an option.
Screenshots from the implementation follow.
Add a source. The EE templates are available in this Family Tree, but the user has chosen the basic GEDCOM template. The top section shows how the input information will appear in three different reference outputs. This is incomplete because the citation information has not been added. As the user enters data the reference display updates. For a GEDCOM source, the information to be entered is just the Author, Title and Publication information. There is also an option to input a short form of the Author, which is used in the short footnote form of output.
As above with some data entered.
Add a citation for the GEDCOM source. This dialogue box is very busy because it includes both the source and the citation information. Later, Gramps was changed so that only a brief mention of the source was displayed. Again, the user has chosen the GEDCOM template. The top part again shows how the citation reference will appear in three different forms of output. The Name is for use inside Gramps, for example in list views; the value is by default set from citation fields, but can be set manually. The Confidence is the normal citation confidence field. The ID is the internal Gramps ID.
The next section contains the various citation fields. Unfortunately the screen resolution (1280 x 800) is insufficient to show the whole window. In this case though it just contains the Date and Page fields. There is also an optional short Date and Page field.
The lower section repeats the information from the Source
Here the user has chosen one of the EE templates. This source contains many different fields.
Here the user has chosen one of the EE templates. This screenshot only shows two of the Citation Fields, Civil Division and Page, but there are also Person of Interest, Date Accessed and Credit Line (in addition to optional short versions).
The lower half of this dialogue box shows how the various fields will be combined for GEDCOM output.
One of the important points about this implementation is that the user has the ability to use templates other than the EE ones. Here the user has not chosen any templates. (IIRC In this implementation the templates are chosen by placing the relevant files in the “scrtemplates” folder in the plugins directory. This would work well if the user wanted to choose templates from the plugins repository. However, this is just one possible approach). Once the user selects to add a source, if necessary, the built in GEDCOM templates are added.
The “Unrecognised Template” choice is just a placeholder.
This is the corresponding citation entry dialogue box for GEDCOM only.
———
A proof of concept was implemented to show how this implementation could interface with the CSL processor citeproc to generate citations according to the desired referencing format.
The EE template 'ESM254' would have the following mapping to convert the EE attributes to CSL attributes.
amap = {
'type' : 'chapter',
'author' : '%(COMPILER)s',
'title' : '%(TITLE)s',
'edition' : '%(TYPE)s',
'container_author' : '%(WEBSITE_CREATOR/OWNER)s',
'container_title' : '%(WEBSITE)s',
'url' : '%(URL_(DIGITAL_LOCATION))s',
'locator' : '%(ITEM_OF_INTEREST)s; %(CREDIT_LINE)s',
'accessed' : '%(DATE_ACCESSED)s',
'page' : '1-7'
}
References
- - Original Users Mailing list discussion: Evidence Explained Style Sources
- - ISO 2709/MARCsupport?