The Challenges of Fitting TEI Tags to a Medieval Dictionary Manuscript

When I started planning my digital edition of a 15th century Latin-German dictionary, I did not expect that choosing the correct XML tags would be almost as much of a challenge as deciding which phenomena to encode.

I want the transcription to be as globally understandable and easily re-usable as possible, so I decided to follow the TEI P5 guidelines and take all the tags required from their repertoire, instead of inventing my own. Doing so narrowed down the number of tags considerably but still gave me plenty of well-tested options to choose from.

However, I did not want the TEI’s guidelines for dictionaries („dictionaries module“) to influence me in my decision on which phenomena or structures to encode. So instead of looking up what the TEI suggests as dictionary specific elements or categories, I first assessed the segmentation as it is marked in the manuscript itself. The markers are usually very clear and consistent; for example, the abbreviation “.r.” (reperitur) always indicates a cross-reference to another dictionary entry, the abbreviation “.t.” (theutonice) is always followed by a German translation, and “inde” indicates a following derivation. Furthermore, these abbreviations are almost always rubricated. Consequently, I assigned each distinguishable explanation a self-created, not necessarily TEI-compliant XML tag. Then, I started an encoding test run.

After a thorough evaluation and some economising adjustments to the encoding practice, I now feel that I have a thorough understanding of the structure of the dictionary entries, and know what to expect from the manuscript, so it’s the right time to replace the work-in-progress tags with proper TEI tags.

1. Eleven Types of Explanations

For the test run, I encoded each explanation with the <sense> tag, and put their respective function in the @type attribute. Doing so helped me to get an organised overview of which explanations I would like to (and reasonably could) distinguish in the entries, without having to anticipate hierarchies or relations between the explanations. Ultimately, I distinguished the following eleven types of explanations:

  • etymology, definition, cross-reference, grammatical information,
  • compound, derivation, variant,
  • spelling,
  • translation, example, bible reference.

In my understanding, all eleven types are hierarchically equal, but the following discussion will illustrate why I listed them in this particular order.

2. Tags That Fit Perfectly

The first group of explanations has a top-level equivalent in the dictionaries module and can therefore be easily replaced: etymology, definition, cross-reference and grammatical information. See the following examples:

1. <etym> (etymology)

augur 720 31v
“fortune teller”, Cod. Guelf. 720 Helmst., 31v

augur .t. vogel wicker ab avis et garrio inde ‘augu’rium et ‘aug’ari

<etym>ab avis et garrio</etym>

2. <def> (definition)

abrogo 720 3r
“destroy”, Cod. Guelf. 720 Helmst., 3r

abrogo -as id est in toto deleo

<def>id est in toto deleo</def>

3. <xr> (cross-reference)

collicus 720 61v
“choleric”, Cod. Guelf. 720 Helmst., 61v

collicus et ‘coll’ericus .r. colon

<xr corresp=“#fkl_bxg_v5”><sItem ref=“#r”>reperitur</sItem> colon</xr>

The @corresp is referring to the @xml:id of the target entry “colon” through an automatically generated and unique ID. The <sItem> is a placeholder that will be replaced as soon as I can confirm whether the “.r.” is to be solved as “require” or as “reperitur”. Target entry:

colon 720 61v
“colic”, Cod. Guelf. 720 Helmst., 61v

colon […] inde colicus et colericus […]

<entry xml:id=“fkl_bxg_v5”>


4. <gram> (grammatical information)

coram 720 68v
“publicly”, Cod. Guelf. 720 Helmst., 68v

coram preposicio vel adverbium id est presencialiter

<gram>preposicio vel adverbium</gram>

<gram> is allowed only as a child of <gramGrp> (grammatical information group). I would prefer the <gramGrp> element to be optional in the dictionaries module, as it is not always necessary, but I have decided to use it every time for the sake of consistency.

3. Explanations Sharing a Tag

The second group of explanations, however, are not so easily replaced. The dictionaries module presets a hierarchy of explanations by allowing only a few top-level elements (or explanations) within an entry. Unfortunately, this does not correspond to the hierarchy in the manuscript. For example, the module subsumes both compounds and derivations under the vague tag <re> (related entry), which is defined as the top-level element for “a lexical item related to the headword, such as a compound phrase or derived form” (cf. TEI). This means the differentiation between the two types of explanation is shifted from tag level to the subordinate attribute level. I dislike this shift because it suggests a hierarchization in comparison to other explanations, which are represented as top-level tags, such as the definition. Additionally, if derivations and compounds have to be subordinate to one tag instead of being represented by their own individual tags, such communal tags should at least be linguistically plausible, such as “word formation” instead of “related entry”. Unfortunately, no such tag exists. While a derivation inarguably refers back to the lemma, which would justify the terminology “related entry”, the same can be said for any explanation, such as a definition or an etymological explanation, because explanations that are not nested as always directly referring back to the lemma.

See the entry “astrum” for the first two explanations of the second group as proposed by the TEI:

5. <re type=”derivation”> (related entry – derivation),

6. <re type=”compound”> (related entry – compound)

astrum 720 29v
“star”, Cod. Guelf. 720 Helmst., 29v

astrum id est stella inde ‘astra’lis ‘astra’le et astrius ‘astri’a ‘astri’um ad astra pertinens et componitur astronimus ‘astro’logus ‘astrolog’a ‘astrolog’um ‘astrolog’ia et ‘astro’labium

<re type=“derivation”>inde <expan><ex>astra</ex>lis</expan> <expan><ex>astra</ex>le</expan> et astrius <expan><ex>astri</ex>a</expan> <expan><ex>astri</ex>um</expan></re>

<re type=“compound”>et componitur astronimus <expan><ex>astro</ex>logus</expan> <expan><ex>astrolog</ex>a</expan> <expan><ex>astrolog</ex>um</expan> <expan><ex>astrolog</ex>ia</expan> et <expan><ex>astro</ex>labium</expan></re>

The clustering of expansions encoded with <expan> and <ex> is typical of derivation and compound explanations.

The group of explanations that I labelled „variant“ is a bit diverse. It contains forms that follow the lemma and which are marked as variants of the lemma by the indicators “vel”, “et” or “idem”. This group usually contains either orthographic variants (“agates vel achates” (agate)) or morphological variants (“appendix et appendicius -a -um idem” (supplement)), sometimes they are borderline synonyms. But the exact discrimination between these types of variants is not the object of this research, and would go beyond the manuscript’s categorisation, therefore I have decided against encoding them separately. But which TEI tag fits this diverse category? The suggested <oVar> (orthographic variant) falls short, because it disregards morphological and synonymic variants. The <def> (definition) is suggested as a general element to encode the “wide variety of different ways” (cf. TEI) in which the meaning of a word can be explained, including, for example, synonyms. However, is a variant a “definition”? Does it explain the meaning of the lemma? I do not believe so. For example, the neuter variant for almond “amigdolum” does not actually explain the meaning of the masculine lemma “amigdolus”.

In my opinion, the tag that comes closest to accommodating all types of variants is again the <re> (related entry). This, of course, puts variants in the same subcategory as derivations and compounds, but this can be justified by the fact that the explanations are closely related and sometimes even interchangeable. For example, diminutives are usually marked as derivations, but in the entry “catella” (female whelp) a diminutive is marked as a variant instead: “et catenula idem”. Therefore:

7. <re type=”variant”> (related entry – variant)

amigdolus 720 15r
“almond”, Cod. Guelf. 720 Helmst., 15r

amigdolus et ‘amigdo’lum .t. eyn mandele inde ‘amigdo’linus -a -um

<re type=“variant”>et <expan><ex>amigdo</ex>lum</expan></re>

4. Semantically Insufficient Tags

Orthographic explanations are a difficult category. They explain certain aspects of the orthography, for example in the entry “alpha”: “potest scribi per ph quia grecum vel per f indifferenter” (“it can be spelled either with ph because it’s Greek or with f”). But they never represent a specific orthographic variant of the lemma (e.g. “alfa”), so again <oVar> (orthographic variant) is not applicable. They are also not definitions that explain the meaning of the lemma or give grammatical information, so <def> (definition) and <gram>(grammatical information) are not applicable either. They sometimes do explain how the spelling changed etymologically, for example in “adimere […] ab ad et emo mutando e in i” (“to take away, steal […] from ad (towards) et emo (buy) and the e is altered to i”), but such explanations are not consistent. What makes this category particularly difficult is that apart from indicating words like “scribi” or “mutando” they are not regularly marked by a distinct indicator. In addition, they are often nested and can therefore refer back to either the lemma or any subordinated form such as a derivation or an etymological explanation, thus ruling out <re> (related entry).

I have concluded that the semantics of the existing TEI tags would have to be bent too much to accommodate the nature of the manuscript’s spelling explanations. As a result, I have decided to make use of the customisation option, which is after all “a central aspect of TEI usage and the Guidelines are designed with customization in mind” (cf. TEI). I’ve therefore created the tag <spell> (spelling) for any kind of spelling explanation or phrase at any hierarchical level:

8. <spell> (spelling)

alpha 720 10v
“alpha”, Cod. Guelf. 720 Helmst., 10v

alpha est prima litera grecorum et valet principium inde alphabetum a quo ‘alphab’eticus ‘alphabetic’a ‘alphabetic’um et potest scribi per ph quia grecum p vel per f indifferenter

<spell>et potest scribi per ph quia grecum <del>p</del> vel per f indifferenter</spell>

5. Medieval vs. Modern Understanding of Citation

Now on to the final challenge: translations and explanations. According to the TEI documentation, “top-level constituents of dictionary entries are” for example “etymology”, “translations into another language” or “examples” (cf. TEI). But whereas the module provides the obvious equivalent <etym> for etymological explanations, it is not as rigorous for translations and examples. Instead, those are subsumed under <cit> (citation), which “may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example” (cf. TEI). This means that – just as with derivations and compounds – translations and examples are not provided with their own tag, but are subsumed under a superordinate. This superordinate “citation” includes the compulsory child <quote> (quotation) and preferably a <bibl> (bibliographic citation) for the source. But the modern concept of citation – consisting of a verbatim quotation and a detailed bibliographical reference – cannot be applied to early modern dictionary manuscripts. It was common practice for lexicographers to copy each other’s works and then to rearrange and enrich the material. Therefore, strictly speaking, almost everything in the dictionary is a possible quote, including the lemmata. In fact, many of the examples are probably quotes, but a source is never given and without thorough research it is impossible to tell – and encode respectively – which ones are quotes and which ones are the author’s own creations. The same difficulty applies to the German translations. Furthermore, it can be assumed that the scribes sometimes replaced the German words if the translation provided was foreign to their respective dialect (read more). Neither the examples nor the translations can therefore be labelled “citation” or “quotes” (including a reference to a source) in the modern sense. Also, since the translations play an important role in both the manuscript and my research, subordinating them under a semantically ill-fitting superordinate tag does not give them the attention they merit. Unfortunately, no general tags for “translation” or “example” exist in any of the modules, leaving me unable to borrow them.

Taking all this into consideration, I have decided that the tags provided by the TEI to mark translations and examples do not reflect the findings in the manuscript and their significance, and I therefore created the tags <trans> (translation) and <exmpl> (example):

9. <trans> (translation)

abrotanum 720 3r
“southernwood”, Cod. Guelf. 720 Helmst., 3r

abrotanum arba .t. everritte

<trans xml:lang=“germ”><sItem ref=“#t”>theutonice</sItem> everritte</trans>

Again, the <sItem> is a placeholder until “theutonice” is confirmed. The @xml:lang discriminates the German translations from Latin translations such as in the following example, where Latin is used to explain a Greek lemma (they are always marked by the word “latine”):

aconitum 720 6r
“(poison of) aconite, wolfsbane et al.”, Cod. Guelf. 720 Helmst., 6r

aconitum grece id est venenum latine

<trans xml:lang=“lat”>id est venenum latine</trans>

10. <exmpl> (example)

clava 720 59r
“club, stick”, Cod. Guelf. 720 Helmst., 59r

clava ‘clav’us et ‘clav’is diffrunt unde clava ferit clavis aperit clavus duo iungit

<exmpl>unde clava ferit clavus aperit clavus duo iungit</exmpl>

Examples are usually mnemonics. They are always underlined in red and start with the rubricated words “unde” or “versus”. The function of this example is to discriminate between the words clava (club), clavis (key) and clavus (nail): “a club beats, a key opens, a nail connects two [things]”.

The biblical references are those that come closest to being quotes, at least semantically. But in the manuscript they are only indirect references to a bible verse, they don’t actually ever quote anything. Again, <cit> (citation) does not fit. The suggested TEI tag for references on the other hand is <ref> (reference) and it “defines a reference to another location, possibly modified by additional text or comment” (cf. TEI). This is much more suitable. The biblical references are therefore treated as references and encoded as such:

11. <ref type=”biblical”> (reference – biblical)

aloa 720 12v
“aloe”, Cod. Guelf. 720 Helmst., 12v

aloa vel aloes arbor suavis vel ungentum ponitur Iohannis decimo nono

<ref type=“biblical”>Iohannis decimo nono</ref>

This is probably a reference to John 19,39 “et Nicodemus qui venerat ad Iesum nocte primum ferens mixturam murrae et aloes quasi libras centum”. A normalised reference to the passage can be added through the @cRef (canonical reference) attribute at a later date.

6. Conclusion

To conclude: Although the TEI module for dictionaries offers a wide variety of appropriate tags, some adjustments are unavoidable in order to capture all research-specific aspects of the manuscript in a sensible way. It is evident that the structure and hierarchy in the module is intended for modern dictionaries and that it can be difficult to apply this hierarchy to a medieval manuscript. Fortunately, the customisation option makes it possible for me to stick to the guidelines as closely as possible by giving me space to make the necessary adjustments to inapplicable tags.

References: TEI P5 Guidelines (9 dictionaries module)


About the Engelhus-Vokabular

WHAT Is the Object of Research?

My research intends to produce a digital edition of a 15th century dictionary called “vocabularius quadriidiomaticus”, “Vokabular”, or “Glossar” (there is, as yet, no universally recognised name), based on two out of 19 surviving manuscript copies. It contains lemmata in both Latin and Greek (using the Latin alphabet), followed by a multitude of explanations, such as definitions, translations into Middle Low German, examples of use, derivations and grammatical information. The dictionary was intended for advanced learners of Latin.

WHO Is the Author?

The author is Dietrich Engelhus (ca. 1362-1434), a chronicler, theologist and school-master from Einbeck, Germany. In addition to compiling teaching books, such as this dictionary and an encyclopaedia called “promptus”, he is well-known to scholars for his world chronicle and theological works.

WHY Are the Manuscripts Important?

What makes the two manuscripts of the dictionary – Cod. Guelf. 720 Helmst. and Cod. Guelf. 956 Helmst. – so fascinating is the circumstances of their composition. It is highly likely that they were dictated to two students at the same time as part of their education. The manuscripts’ unusually detailed colophons and indicators in the text support this assumption (read more). The colophons mention not only the scribes’ names (Ludolf Oldendorp and Hermann von Hildesheim), but also indicate a completion date (24th August 1444) and even the exact completion time (“hora tercia post prandium” – in the third hour in the morning). Furthermore, they suggest, that it was a certain Konrad Sprink who dictated the dictionary to the students. Editing, encoding and comparing the two manuscripts will therefore provide an insight to their lexicographic and linguistic peculiarities as well as to the educational circumstances under which they were produced.