How an Ignoramus Reads “XPC”

In the entry “crisma” (Engl. chrism) we get an explanation of how the abbreviation “XPC” for “Christus” came into being. This is exciting, because it’s very unusual for the dictionary to provide information on the etymology and use of abbreviations.

crisma 956 68v
Cod. Guelf. 956 Helmst., 68v

crisma grece unxio latine .t. kresem inde venit grece cristus xpc hoc est Cristus quod scribitur grece in breviatura que tribus grecis literis que sunt c r s et quia sunt similes nostris x p c ideo ignari dicunt Cristus x p c esse scriptum literis latinis cum sint grece litere scilicet cappa cappa res sima inde Cristianus -a -um et ‘Cristi’anitas et Cristianismus quasi Cristianorum mos

Translation: “[…] xpc that is Christus which is spelled in Greek as an abbreviation with three Greek letters which are c r s. And because they are similar to our [letters] x p c, those who are unaware of this think that Christus, [abbreviated as] xpc, is spelled in Latin letters, whereas they are [actually] the Greek letters kappa rho sigma […]”

It seems as if neither of the two scribes knew that an orthographic explanation for the xpc-abbreviation would follow when they started writing the entry: Ms956 spelled the word “cristus” in full before crossing it out and replacing it with the xpc-abbreviation, whereas Ms720 did use an abbreviation, but a different one: the x for Christus followed by the common abbreviation for the ending -us.

Furthermore, the orthographic explanation obviously didn’t influence the scribes’ own spelling habits much, because in the following entries Ms720 continues to use the x+us abbreviation instead of xpc, and Ms956 usually spells any form of Christus in full and doesn’t use abbreviations at all.

The Challenges of Fitting TEI Tags to a Medieval Dictionary Manuscript

When I started planning my digital edition of a 15th century Latin-German dictionary, I did not expect that choosing the correct XML tags would be almost as much of a challenge as deciding which phenomena to encode.

I want the transcription to be as globally understandable and easily re-usable as possible, so I decided to follow the TEI P5 guidelines and take all the tags required from their repertoire, instead of inventing my own. Doing so narrowed down the number of tags considerably but still gave me plenty of well-tested options to choose from.

However, I did not want the TEI’s guidelines for dictionaries („dictionaries module“) to influence me in my decision on which phenomena or structures to encode. So instead of looking up what the TEI suggests as dictionary specific elements or categories, I first assessed the segmentation as it is marked in the manuscript itself. The markers are usually very clear and consistent; for example, the abbreviation “.r.” (reperitur) always indicates a cross-reference to another dictionary entry, the abbreviation “.t.” (theutonice) is always followed by a German translation, and “inde” indicates a following derivation. Furthermore, these abbreviations are almost always rubricated. Consequently, I assigned each distinguishable explanation a self-created, not necessarily TEI-compliant XML tag. Then, I started an encoding test run.

After a thorough evaluation and some economising adjustments to the encoding practice, I now feel that I have a thorough understanding of the structure of the dictionary entries, and know what to expect from the manuscript, so it’s the right time to replace the work-in-progress tags with proper TEI tags.

1. Eleven Types of Explanations

For the test run, I encoded each explanation with the <sense> tag, and put their respective function in the @type attribute. Doing so helped me to get an organised overview of which explanations I would like to (and reasonably could) distinguish in the entries, without having to anticipate hierarchies or relations between the explanations. Ultimately, I distinguished the following eleven types of explanations:

  • etymology, definition, cross-reference, grammatical information,
  • compound, derivation, variant,
  • spelling,
  • translation, example, bible reference.

In my understanding, all eleven types are hierarchically equal, but the following discussion will illustrate why I listed them in this particular order.

2. Tags That Fit Perfectly

The first group of explanations has a top-level equivalent in the dictionaries module and can therefore be easily replaced: etymology, definition, cross-reference and grammatical information. See the following examples:

1. <etym> (etymology)

augur 720 31v
“fortune teller”, Cod. Guelf. 720 Helmst., 31v

augur .t. vogel wicker ab avis et garrio inde ‘augu’rium et ‘aug’ari

<etym>ab avis et garrio</etym>

2. <def> (definition)

abrogo 720 3r
“destroy”, Cod. Guelf. 720 Helmst., 3r

abrogo -as id est in toto deleo

<def>id est in toto deleo</def>

3. <xr> (cross-reference)

collicus 720 61v
“choleric”, Cod. Guelf. 720 Helmst., 61v

collicus et ‘coll’ericus .r. colon

<xr corresp=“#fkl_bxg_v5”><sItem ref=“#r”>reperitur</sItem> colon</xr>

The @corresp is referring to the @xml:id of the target entry “colon” through an automatically generated and unique ID. The <sItem> is a placeholder that will be replaced as soon as I can confirm whether the “.r.” is to be solved as “require” or as “reperitur”. Target entry:

colon 720 61v
“colic”, Cod. Guelf. 720 Helmst., 61v

colon […] inde colicus et colericus […]

<entry xml:id=“fkl_bxg_v5”>


4. <gram> (grammatical information)

coram 720 68v
“publicly”, Cod. Guelf. 720 Helmst., 68v

coram preposicio vel adverbium id est presencialiter

<gram>preposicio vel adverbium</gram>

<gram> is allowed only as a child of <gramGrp> (grammatical information group). I would prefer the <gramGrp> element to be optional in the dictionaries module, as it is not always necessary, but I have decided to use it every time for the sake of consistency.

3. Explanations Sharing a Tag

The second group of explanations, however, are not so easily replaced. The dictionaries module presets a hierarchy of explanations by allowing only a few top-level elements (or explanations) within an entry. Unfortunately, this does not correspond to the hierarchy in the manuscript. For example, the module subsumes both compounds and derivations under the vague tag <re> (related entry), which is defined as the top-level element for “a lexical item related to the headword, such as a compound phrase or derived form” (cf. TEI). This means the differentiation between the two types of explanation is shifted from tag level to the subordinate attribute level. I dislike this shift because it suggests a hierarchization in comparison to other explanations, which are represented as top-level tags, such as the definition. Additionally, if derivations and compounds have to be subordinate to one tag instead of being represented by their own individual tags, such communal tags should at least be linguistically plausible, such as “word formation” instead of “related entry”. Unfortunately, no such tag exists. While a derivation inarguably refers back to the lemma, which would justify the terminology “related entry”, the same can be said for any explanation, such as a definition or an etymological explanation, because explanations that are not nested as always directly referring back to the lemma.

See the entry “astrum” for the first two explanations of the second group as proposed by the TEI:

5. <re type=”derivation”> (related entry – derivation),

6. <re type=”compound”> (related entry – compound)

astrum 720 29v
“star”, Cod. Guelf. 720 Helmst., 29v

astrum id est stella inde ‘astra’lis ‘astra’le et astrius ‘astri’a ‘astri’um ad astra pertinens et componitur astronimus ‘astro’logus ‘astrolog’a ‘astrolog’um ‘astrolog’ia et ‘astro’labium

<re type=“derivation”>inde <expan><ex>astra</ex>lis</expan> <expan><ex>astra</ex>le</expan> et astrius <expan><ex>astri</ex>a</expan> <expan><ex>astri</ex>um</expan></re>

<re type=“compound”>et componitur astronimus <expan><ex>astro</ex>logus</expan> <expan><ex>astrolog</ex>a</expan> <expan><ex>astrolog</ex>um</expan> <expan><ex>astrolog</ex>ia</expan> et <expan><ex>astro</ex>labium</expan></re>

The clustering of expansions encoded with <expan> and <ex> is typical of derivation and compound explanations.

The group of explanations that I labelled „variant“ is a bit diverse. It contains forms that follow the lemma and which are marked as variants of the lemma by the indicators “vel”, “et” or “idem”. This group usually contains either orthographic variants (“agates vel achates” (agate)) or morphological variants (“appendix et appendicius -a -um idem” (supplement)), sometimes they are borderline synonyms. But the exact discrimination between these types of variants is not the object of this research, and would go beyond the manuscript’s categorisation, therefore I have decided against encoding them separately. But which TEI tag fits this diverse category? The suggested <oVar> (orthographic variant) falls short, because it disregards morphological and synonymic variants. The <def> (definition) is suggested as a general element to encode the “wide variety of different ways” (cf. TEI) in which the meaning of a word can be explained, including, for example, synonyms. However, is a variant a “definition”? Does it explain the meaning of the lemma? I do not believe so. For example, the neuter variant for almond “amigdolum” does not actually explain the meaning of the masculine lemma “amigdolus”.

In my opinion, the tag that comes closest to accommodating all types of variants is again the <re> (related entry). This, of course, puts variants in the same subcategory as derivations and compounds, but this can be justified by the fact that the explanations are closely related and sometimes even interchangeable. For example, diminutives are usually marked as derivations, but in the entry “catella” (female whelp) a diminutive is marked as a variant instead: “et catenula idem”. Therefore:

7. <re type=”variant”> (related entry – variant)

amigdolus 720 15r
“almond”, Cod. Guelf. 720 Helmst., 15r

amigdolus et ‘amigdo’lum .t. eyn mandele inde ‘amigdo’linus -a -um

<re type=“variant”>et <expan><ex>amigdo</ex>lum</expan></re>

4. Semantically Insufficient Tags

Orthographic explanations are a difficult category. They explain certain aspects of the orthography, for example in the entry “alpha”: “potest scribi per ph quia grecum vel per f indifferenter” (“it can be spelled either with ph because it’s Greek or with f”). But they never represent a specific orthographic variant of the lemma (e.g. “alfa”), so again <oVar> (orthographic variant) is not applicable. They are also not definitions that explain the meaning of the lemma or give grammatical information, so <def> (definition) and <gram>(grammatical information) are not applicable either. They sometimes do explain how the spelling changed etymologically, for example in “adimere […] ab ad et emo mutando e in i” (“to take away, steal […] from ad (towards) et emo (buy) and the e is altered to i”), but such explanations are not consistent. What makes this category particularly difficult is that apart from indicating words like “scribi” or “mutando” they are not regularly marked by a distinct indicator. In addition, they are often nested and can therefore refer back to either the lemma or any subordinated form such as a derivation or an etymological explanation, thus ruling out <re> (related entry).

I have concluded that the semantics of the existing TEI tags would have to be bent too much to accommodate the nature of the manuscript’s spelling explanations. As a result, I have decided to make use of the customisation option, which is after all “a central aspect of TEI usage and the Guidelines are designed with customization in mind” (cf. TEI). I’ve therefore created the tag <spell> (spelling) for any kind of spelling explanation or phrase at any hierarchical level:

8. <spell> (spelling)

alpha 720 10v
“alpha”, Cod. Guelf. 720 Helmst., 10v

alpha est prima litera grecorum et valet principium inde alphabetum a quo ‘alphab’eticus ‘alphabetic’a ‘alphabetic’um et potest scribi per ph quia grecum p vel per f indifferenter

<spell>et potest scribi per ph quia grecum <del>p</del> vel per f indifferenter</spell>

5. Medieval vs. Modern Understanding of Citation

Now on to the final challenge: translations and explanations. According to the TEI documentation, “top-level constituents of dictionary entries are” for example “etymology”, “translations into another language” or “examples” (cf. TEI). But whereas the module provides the obvious equivalent <etym> for etymological explanations, it is not as rigorous for translations and examples. Instead, those are subsumed under <cit> (citation), which “may contain an example text with at least one occurrence of the word form, used in the sense being described, or a translation of the headword, or an example” (cf. TEI). This means that – just as with derivations and compounds – translations and examples are not provided with their own tag, but are subsumed under a superordinate. This superordinate “citation” includes the compulsory child <quote> (quotation) and preferably a <bibl> (bibliographic citation) for the source. But the modern concept of citation – consisting of a verbatim quotation and a detailed bibliographical reference – cannot be applied to early modern dictionary manuscripts. It was common practice for lexicographers to copy each other’s works and then to rearrange and enrich the material. Therefore, strictly speaking, almost everything in the dictionary is a possible quote, including the lemmata. In fact, many of the examples are probably quotes, but a source is never given and without thorough research it is impossible to tell – and encode respectively – which ones are quotes and which ones are the author’s own creations. The same difficulty applies to the German translations. Furthermore, it can be assumed that the scribes sometimes replaced the German words if the translation provided was foreign to their respective dialect (read more). Neither the examples nor the translations can therefore be labelled “citation” or “quotes” (including a reference to a source) in the modern sense. Also, since the translations play an important role in both the manuscript and my research, subordinating them under a semantically ill-fitting superordinate tag does not give them the attention they merit. Unfortunately, no general tags for “translation” or “example” exist in any of the modules, leaving me unable to borrow them.

Taking all this into consideration, I have decided that the tags provided by the TEI to mark translations and examples do not reflect the findings in the manuscript and their significance, and I therefore created the tags <trans> (translation) and <exmpl> (example):

9. <trans> (translation)

abrotanum 720 3r
“southernwood”, Cod. Guelf. 720 Helmst., 3r

abrotanum arba .t. everritte

<trans xml:lang=“germ”><sItem ref=“#t”>theutonice</sItem> everritte</trans>

Again, the <sItem> is a placeholder until “theutonice” is confirmed. The @xml:lang discriminates the German translations from Latin translations such as in the following example, where Latin is used to explain a Greek lemma (they are always marked by the word “latine”):

aconitum 720 6r
“(poison of) aconite, wolfsbane et al.”, Cod. Guelf. 720 Helmst., 6r

aconitum grece id est venenum latine

<trans xml:lang=“lat”>id est venenum latine</trans>

10. <exmpl> (example)

clava 720 59r
“club, stick”, Cod. Guelf. 720 Helmst., 59r

clava ‘clav’us et ‘clav’is diffrunt unde clava ferit clavis aperit clavus duo iungit

<exmpl>unde clava ferit clavus aperit clavus duo iungit</exmpl>

Examples are usually mnemonics. They are always underlined in red and start with the rubricated words “unde” or “versus”. The function of this example is to discriminate between the words clava (club), clavis (key) and clavus (nail): “a club beats, a key opens, a nail connects two [things]”.

The biblical references are those that come closest to being quotes, at least semantically. But in the manuscript they are only indirect references to a bible verse, they don’t actually ever quote anything. Again, <cit> (citation) does not fit. The suggested TEI tag for references on the other hand is <ref> (reference) and it “defines a reference to another location, possibly modified by additional text or comment” (cf. TEI). This is much more suitable. The biblical references are therefore treated as references and encoded as such:

11. <ref type=”biblical”> (reference – biblical)

aloa 720 12v
“aloe”, Cod. Guelf. 720 Helmst., 12v

aloa vel aloes arbor suavis vel ungentum ponitur Iohannis decimo nono

<ref type=“biblical”>Iohannis decimo nono</ref>

This is probably a reference to John 19,39 “et Nicodemus qui venerat ad Iesum nocte primum ferens mixturam murrae et aloes quasi libras centum”. A normalised reference to the passage can be added through the @cRef (canonical reference) attribute at a later date.

6. Conclusion

To conclude: Although the TEI module for dictionaries offers a wide variety of appropriate tags, some adjustments are unavoidable in order to capture all research-specific aspects of the manuscript in a sensible way. It is evident that the structure and hierarchy in the module is intended for modern dictionaries and that it can be difficult to apply this hierarchy to a medieval manuscript. Fortunately, the customisation option makes it possible for me to stick to the guidelines as closely as possible by giving me space to make the necessary adjustments to inapplicable tags.

References: TEI P5 Guidelines (9 dictionaries module)

About the Engelhus-Vokabular

WHAT Is the Object of Research?

My research intends to produce a digital edition of a 15th century dictionary called “vocabularius quadriidiomaticus”, “Vokabular”, or “Glossar” (there is, as yet, no universally recognised name), based on two out of 19 surviving manuscript copies. It contains lemmata in both Latin and Greek (using the Latin alphabet), followed by a multitude of explanations, such as definitions, translations into Middle Low German, examples of use, derivations and grammatical information. The dictionary was intended for advanced learners of Latin.

WHO Is the Author?

The author is Dietrich Engelhus (ca. 1362-1434), a chronicler, theologist and school-master from Einbeck, Germany. In addition to compiling teaching books, such as this dictionary and an encyclopaedia called “promptus”, he is well-known to scholars for his world chronicle and theological works.

WHY Are the Manuscripts Important?

What makes the two manuscripts of the dictionary – Cod. Guelf. 720 Helmst. and Cod. Guelf. 956 Helmst. – so fascinating is the circumstances of their composition. It is highly likely that they were dictated to two students at the same time as part of their education. The manuscripts’ unusually detailed colophons and indicators in the text support this assumption (read more). The colophons mention not only the scribes’ names (Ludolf Oldendorp and Hermann von Hildesheim), but also indicate a completion date (24th August 1444) and even the exact completion time (“hora tercia post prandium” – in the third hour in the morning). Furthermore, they suggest, that it was a certain Konrad Sprink who dictated the dictionary to the students. Editing, encoding and comparing the two manuscripts will therefore provide an insight to their lexicographic and linguistic peculiarities as well as to the educational circumstances under which they were produced.

Blasphemy = Women’s Stupidity?

What makes a dictionary entry noteworthy? In this case it’s a questionable etymological explanation that nowadays causes a raised eyebrow, at the least.

In the entry blas (stupid) the lemma is followed, as usual, by derivations. The first one is comparatively unexciting: blasfemus, which is translated into Middle Low German as gotschender vel bespotter – a blasphemer.

But then the second derivation blasfemia – blasphemy in the theological sense – is not translated, instead it’s explained as „women’s stupidity, because women’s talk is like that“, and is then translated as „tattle“. Although it’s not explicitly mentioned in the entry, that definition is clearly influenced by the pseudo etymology blas-femina.

A serious attempt at etymology or merely a pun?

blas 956 41v
Cod. Guelf. 956 Helmst., 41v

Blas grece stultus latine inde blasfemus -a -um theutonice eyn gotschender vel bespotter et blasfemia quasi stulticia mulierum quia muliebre est sic loqui theutonice vlok vel honsprake inde [blasfem]are

Translation: Blas, Greek, stupid in Latin, thus blasfemus, -a, -um, eyn gotschender or bespotter (blasphemer) in German. And blasfemia, quasi women’s stupidity, because women’s talk is like that, is vlok (curse, swear) or honsprake (tattle) in German, thus [blasfem]are.

Dictating Dictionaries as a Teaching Method

Dictating an entire dictionary to students as a teaching method may seem odd from a modern point of view, but in early modern times it was common practice and served both as part of the teaching process and as means to provide a copy of the dictionary for the student’s personal use.

My research focusses on two manuscripts from the 15th century called the Engelhus-Glossar, Engelhus-Vokabular or vocabularius quadriidiomaticus. Not only do they transmit the same dictionary, but it can also be assumed from their almost identical colophons that they were both dictated by the same teacher and finished at exactly the same day in a school in Hannover.

This first blog entry is to give an example of a few methods and indicators that support the assumption that these two manuscripts were indeed written from dictation and not merely copied.

1. Dictation Errors

For the first example see the entry Acom(m)entaris (commenter):

Cod. Guelf. 720 Helmst., 5r
Cod. Guelf. 956 Helmst., 12r

Acom(m)entaris nomen indeclinabile id est scriptor vel notarius ponitur secundo regum octavo similiter ista et sunt indeclinabilia et communis generis

It seems unlikely that two students copying a written original would copy the same crossed out parts of text instead of omitting them in their own manuscripts. What’s more likely is that every now and then the baccalarius (Konrad Sprink) who dictated the dictionary to the students misread his own manuscript or made a dictation error, which the students then had to correct in their texts. In this case both scribes wrote „ista“, crossed it out and replaced it with „et“ (the abbreviation that looks like the number seven). Seeing as the first scribe, Ludolf Oldendorp, squeezed the „et“ in at the very end of the line and the second, Hermann von Hildesheim, wrote it above the crossed out „ista“ for lack of space, it can be assumed that the following word „sunt“ had already been dictated, when the error was noticed.

For the next example see the entry Afficere (to affect):

Cod. Guelf. 720 Helmst., 7v
Cod. Guelf. 956 Helmst., 12v

Afficere ab ad et facere equivocatur unde afficit inpo informat cupit punit[funit(!)] hec tria signat […]

Both scribes wrote „inpo“ and then crossed it out to replace it with „informat“. The faulty word breaks of mid-word, so it can be assumed that, in contrast to the first example, the mistake was noticed immediately. A possible explanation would be that in the teacher’s manuscript the „f“ in „informat“ looked too much like a „p“ and he misread it. Whereas Hermann started the whole word anew, it’s hard to tell if Ludolf decided to keep the partly crossed out „in“ and just added the rest of the word or didn’t realise that an „in“ was supposed to be a part of the corrected form.

2. Pressure of Time

A very important factor with dictation is time. Or rather lack thereof. Writing from dictation means there is only a limited amount of time to think about what one is writing, to contemplate unclear passages or to make changes to what’s already been written. In this respect gaps in an entry hint very strongly at a scribe’s intention to add something at a later date but then never getting around to actually doing it. Or, as with the following example, gaps that the scribe actually got round to filling, but where he miscalculated the amount of space he would need to fill in the missing words. See Apo (from, since, re-):

Cod. Guelf. 720 Helmst., 20v

Apo grece id est re vel retro latine inde apocalipsis id est revelacio de quo aco apocalipsare item apo grece id est a vel ab latine

Taking into consideration the visibly compressed script I assume that Ludolf followed the dictation up to „revelacio de quo aco“, then left a small gap to correct the long word he just started to misspell later and continued with „item apo“. When he then tried to fill in „apocalipsare“ he must have realised that he had left too little space and had to squeeze the letters in to make it fit before „item“. This need to add or correct parts at a later date can only be explained by dictation, because with copying a written text there is no need to keep on writing before a part is completed or corrected.

Differences between the two manuscripts in the entry Ciconia (stork) can also be attributed to there not being enough time to think during dictation. See:

Cod. Guelf. 720 Helmst., 56r
Cod. Guelf. 956 Helmst., 54v

Whereas Ludolf wrote a long article with explanation and German equivalent: „Ciconia avis est theutonice eyn edeber“, Hermann’s manuscript transmits a much shorter version: „Ciconia theutonice stork“. It can be assumed, that the scribes had to translate the German equivalent, that was being read out, into their own dialect if they didn’t know it or if it was uncommon. Which means that Hermann might not have wanted to use the word „edeber“ to describe this animal, because it was foreign to his dialect, and then took too long to think up the correct form in his dialect so that he didn’t manage to write down the entire article in time.

3. Spelling Variants

By far the most prominent differences between the two manuscripts are spelling mistakes or spelling variants. Many of them are awkward to explain by misreading but they are easily explained by mishearing and phonological idiosyncrasies. The most common are variants between letters representing the same or similar sounds such as d/t (capud/caput), f/ph (feon/pheon), c/s (scirpeus/cirpeus or serpens/cerpens), c/ch (abbacus/abbachus), m/mp (calumnia/calumpnia) or an initial-h (oralogium/horalogium). In the German words the variations are even more noticeable and most probably accounted for by the scribes’ different dialectal backgrounds: walvis/walfisch, scorsten/schorsteyn, opper/offer.

I think these examples support the assumption that the two manuscripts really were written from dictation.