← Back to blog overview

What we are working on: an AI-assisted candidate space for medieval abbreviations

Why Cappelli, Abbreviationes and MUFI become verifiable candidates in HistoriaMP, not automatic truths.

HistoriaMP is developing an analysis layer that does not immediately smooth medieval abbreviations. Image finding, sign form, reference comparison and uncertainty assessment produce a candidate space - not a finished finding.

At HistoriaMP we are working on an analysis environment for historical manuscripts that does not simply turn an image into a finished text.

Medieval sources in particular do not consist only of letters in the modern sense. They contain abbreviation signs, ligatures, special forms, damaged places, scribal habits and visual details that can be decisive for a reliable reading.

Our goal is therefore not:

Image in - text out.

Our goal is a traceable path:

Image finding - sign form - reference comparison - candidate formation - review - finding

The central idea behind this is:

A candidate is not a finding.

A possible reading does not become reliable because it sounds linguistically smooth. It becomes reliable when the path by which it arose remains traceable.

Cappelli, Abbreviationes and MUFI: three different layers

An important building block of our work is the integration of classical and digital aids. Cappelli, Abbreviationes and MUFI are especially interesting here.

Cappelli's Lexicon abbreviaturarum is one of the classical reference works for Latin and Italian abbreviations in manuscripts, charters and inscriptions. It helps with the question of which abbreviation forms are historically attested and which expansions might be possible.

Abbreviationes is a digital database of medieval Latin abbreviations. It can speed up research and make comparison forms accessible.

MUFI, the Medieval Unicode Font Initiative, works on another layer. It is not primarily about the meaning of a sign, but about its digital representation: how can medieval special signs, ligatures or particular abbreviation signs be represented so that they are not simply resolved into modern plain text and thereby made invisible?

Put simply:

  • Cappelli helps with historical abbreviation knowledge.
  • Abbreviationes helps with digital comparison search.
  • MUFI helps with controlled sign representation.

This separation is decisive. A visible sign, a possible expansion and a digital encoding are three different things.

What we are developing from this

We are working to make these three layers usable as a structured reference layer inside HistoriaMP.

A dedicated AI layer is intended to evaluate Cappelli, Abbreviationes and MUFI together. The goal is not to immediately produce a final reading. Instead, a candidate space should emerge.

Technically, we think of this layer as a combination of visual sign analysis, rule-based prefilters, reference-based retrieval, ranking and uncertainty assessment.

For an observed sign form or abbreviation, possible expansions, comparison hits and digital sign representations are brought together. This creates several verifiable candidates that can later be assessed in context.

AI therefore does not become the final authority. It becomes a structuring tool between image finding, reference knowledge and later review decision.

A simplified example

Assume that a p-like sign with an abbreviation stroke appears in a manuscript.

An automatic system could turn this directly into a word. Depending on the model, perhaps "per", "pro" or another linguistically plausible form.

HistoriaMP would not smooth this step immediately. Instead, the system could generate several candidates:

CandidateProposalSupportStatusUncertaintyNote
Apercomparable abbreviation form in CappellipossiblemediumSign form broadly fits, contextual review required
Bpropossible comparison hit in AbbreviationespossiblehighSign form could fit, linguistic context not yet sufficiently checked
Cpreserved abbreviation sign following MUFI/Unicode logiccontrolled sign representationdiplomatic representationopenMeaning not yet resolved, sign form should first be preserved

The system therefore does not produce a finished text, but documented possibilities.

Every candidate remains verifiable. It remains visible which reference was used, which uncertainty exists and whether the candidate is a possible expansion or only a controlled sign representation.

Uncertainty as part of the analysis

Uncertainty in historical manuscripts is not an error. It is part of the material.

A candidate may be uncertain because the image location is damaged, the sign form resembles several abbreviations, the reference situation remains ambiguous, the linguistic context allows several variants or the scribe's hand differs from known comparison examples.

Uncertainty should therefore not be hidden in HistoriaMP. It should remain visible, describable and verifiable.

A candidate is therefore not only a word proposal. It is a small package of image reference, reference relation, possible meaning, sign representation and uncertainty status.

How this should become visible

In the work interface, candidates should not simply appear as finished text. They should become visible as verifiable options.

One conceivable view would place several candidates next to each other: with image excerpt, marked sign form, possible reading, reference source, uncertainty level, justification, alternative proposals and review status.

A user should therefore not only see:

The system proposes per.

But:

The system proposes per because the sign form is comparable with a Cappelli reference. At the same time, there is a possible Abbreviationes hit for pro. The MUFI/Unicode representation initially preserves the abbreviation sign diplomatically. The place remains in need of review.

This is the difference between an output and a traceable analysis.

Target model

In simplified form, the intended working logic can be described like this:

Image finding
- sign form / glyph / abbreviation sign
- comparison with Cappelli, Abbreviationes and MUFI
- AI-assisted candidate space
- context review
- uncertainty assessment
- review decision
- checked finding

Context review includes not only sentence context, but also grammar, word position, scribal hand and comparison forms within the same manuscript.

Classical palaeography, digital aids and AI are therefore not played against one another. They are integrated into a shared, verifiable analysis architecture.

  • Cappelli provides historical abbreviation knowledge.
  • Abbreviationes provides digital comparison possibilities.
  • MUFI supports controlled sign representation.
  • AI structures candidates from these layers.
  • Review later decides what becomes reliable.

Conclusion

With HistoriaMP we are working on an analysis environment that does not prematurely turn historical manuscripts into smooth text.

An important part of this work is the construction of an AI-assisted candidate space for medieval abbreviations and special signs.

The goal is an analysis in which it remains visible what was observed, which references were used, which alternatives are possible and where uncertainty remains.

Or briefly:

HistoriaMP should not only help read. HistoriaMP should show how a reading comes into being.

Tools and sources mentioned

Adriano Cappelli: Lexicon abbreviaturarum
Classical reference work for Latin and Italian abbreviations in manuscripts, charters and inscriptions. The work first appeared in 1899 and was later revised and expanded several times.
Abbreviationes
Electronic database of medieval Latin abbreviations. It supports digital research and candidate formation when working with abbreviated Latin texts.
MUFI - Medieval Unicode Font Initiative
Initiative for encoding and representing special medieval characters, ligatures, abbreviations and sign forms for digital editions and scholarly text processing.
HTR - Handwritten Text Recognition
Automatic handwriting recognition. For HistoriaMP it is interesting as a possible candidate layer, but not as a final finding.
LLM - Large Language Model
Language model for analyzing, explaining and generating text. Useful for HistoriaMP as an aid, but not as a replacement for image finding, reference review and source-critical assessment.

Short summary

HistoriaMP is developing an AI-assisted candidate space for medieval abbreviations. Cappelli, Abbreviationes and MUFI are used as separate reference layers: historical abbreviation knowledge, digital comparison search and controlled sign representation. A candidate is not yet a finding. A reading becomes reliable only when image reference, reference relation, uncertainty and review decision are documented traceably.

Frequently asked questions

What is a candidate space?

A candidate space is a structured collection of possible readings, sign representations and reference hits. It does not replace review, but makes alternatives visible.

Why is a candidate not a finding?

A candidate may be plausible, but remains a proposal. A finding must be bound to a visible image location, a documented reference and a verifiable decision.

What role do Cappelli, Abbreviationes and MUFI play?

Cappelli supports historical abbreviation knowledge, Abbreviationes digital comparison search and MUFI controlled sign representation. These layers must not be mixed.

Project context

This article belongs to the methodological development of HistoriaMP. More on the project's position, limits and contact route is available on the project page.

About HistoriaMP · Contact