Become a Better Researcher

Our research problems are unique and our genealogy software, to be useful, must be flexible enough to match our respective problems and our respective methods. The Master Genealogist is that software, but power and flexibility has a down side. The more options a program has, the more decisions the user must make. This year, the Tri-Valley TMG User Group will explore those options and make some of those personal decisions. Would you like to play along with us? Do each month's assignment, and if you like, e-mail it to us at: tvtmg.chair@L-AGS.org. We'll post some of the completed assignments on this blog each month. Let's hear it for choices!

Sunday, May 4, 2014

Commentary on the Subject of Data Integrity

The subject of data integrity and TMG came up in private conversations a few times in the past month. I'm probably not alone in thinking that data integrity in a database is simply that database's ability to preserve all data in its correct place. Wikipedia has a more thorough explanation. This phrase from the article, combined with recent experiences with a website generated by Second Site, triggered one of the discussion topics we had at the April meeting.

"The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended ...  and upon later retrieval, ensure the data is the same as it was when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information." [emphasis added]

Maintaining data integrity while allowing users to do almost anything they wish when inputting or reporting that data must be a very difficult balancing act, and it's one that I am very grateful Bob Velke and the TMG developers attempt. I really like doing things my way! Still, there is one TMG feature that has bothered me since its introduction: the sentence variable, Age ([A] or [AE]). This variable is used in the sentences in two tags in the Sample database: the Census tag and the Death tag. It seems like an innocuous feature, but I think it leads to a loss of data integrity. Consider this fictitious example.
  • According to town records, a Jonathan Hornbuckle was born in Anytown on 12 September 1792.
  • A Jonathan Hornbuckle was buried in Anytown Cemetery, and the date of death on his tombstone is 14 October 1869.
  • You enter the birth and death dates in your TMG database and print a Journal report.
  • You have not edited the TMG default sentence structure, which is, "[P] died <[D]> <[L]> <[A]>."
Here's the resulting sentence output: "Jonathan Hornbuckle died on 12 September 1792 at age 77." TMG has calculated the age at death by subtracting the date of birth from the date of death. What's the problem? In my fictitious example, the tombstone also said he was "in his 68th year." Something is wrong somewhere! Either a date is wrong, the age at death is wrong, or ... these records refer to two different Jonathan Hornbuckles. Conflicting information is often the inspiration for more thorough research and careful analysis. The use of this Age variable in a sentence disguises conflicts and ignores important information. It's an unintentional change to information. Someone unfamiliar with the Hornbuckle family, when reading this sentence, would have no clue that the age at death was calculated, not part of the information found in the source.

My subconscious aversion to the Age variable was brought to my conscious attention while studying a Second Site website on a family I've been researching. The website has a lot of good features, but the author did leave the Age variable in the TMG sentence structure. As a result, the fact that some of the individuals have been misidentified and some of the incorrect relationships have been obscured is not apparent. If the age at death as stated in the records had been included in the sentences, the conflicting information would be obvious.

My recommendation? Remove the Age variable from every sentence structure.
  • Open the Tag Type List.
  • Highlight the Death (or Census) tag.
  • Click "Edit" to open the Tag Type Definition screen.
  • Select the "Roles and Sentences" tab.
  • Remove the <[A]> from any sentence structure that uses it.
  • Click "OK."
Data integrity is not just a database requirement. It's a research requirement, too.

No comments:

Post a Comment