The story behind HistoriaMP

I simply wanted to know what was really written there.

A personal question about historical sources became a pipeline that does not simply want to read, but has to show how a reading comes into being.

AI may help. But the source decides.

The beginning

Not as a software project. But out of dissatisfaction.

HistoriaMP did not begin with a research proposal, not with a working group and not with the plan to build a platform. It began with an interest in the Templars - and with the growing feeling that, when reading many accounts, I remained surprisingly far away from the actual source.

One reads a great deal about sources. But at some point one asks: where is the evidence, actually? Where does the manuscript end, and where does interpretation, smoothing or transmission begin?

The question was not complicated. It was only persistent: What is really written there?

Not: what was later made from it? Not: which modern narrative fits well? But: what can actually be seen, checked and justified in the concrete manuscript?

The first AI test

Then I simply put a manuscript image into an AI.

The first step was not a fully developed technical plan. It was an experiment. Put in a manuscript image and see what happens.

The first results were fascinating. The AI recognized structures, described pages and generated readings that seemed surprisingly plausible. For someone without a large research team, without a ready-made training corpus and without institutional infrastructure, that was a moment of real opening.

But that was exactly where the problem lay.

The sobering point

Plausible is not the same as evidenced.

An AI can formulate convincingly. It can close gaps, add from context, smooth damaged passages and generate from a few visible traces a text that sounds so good that one almost forgets to check it against the image.

With historical manuscripts, that is precisely what is dangerous. A beautiful text is not yet a secure text. A linguistically fitting reading is not yet source evidence.

The problem was not that AI could do nothing. The problem was that it could often do too much - and seemed too certain while doing it.

The turning point

A transcription attempt became a pipeline.

The workflow "image in, text out" was methodologically too weak. It concealed exactly the part that matters most with historical sources: the path from visible evidence to reading.

So the question changed. No longer: can the AI read this? Instead: how do I force a system to justify its reading?

Early modules must not read. They may only observe. Layout must not be confused with meaning. A glyph must not be turned too quickly into a letter. A minim cluster must not automatically become a word. The source comes first. The text comes later.

Why LLM instead of only HTR?

Anyone who cannot read a script cannot simply train it either.

Classical HTR systems have their value. But anyone who wants to read an unknown historical hand faces precisely the difficulty that they cannot read it securely. To train an HTR system meaningfully, however, one would already have to provide many correct examples.

That is paradoxical: one trains a system so that it can read the script - but to train it, one must already be able to read the script to a considerable extent.

That is why the LLM path is important for HistoriaMP. Not because LLMs are automatically right. But because they can provide an entry point before a specific training corpus exists - if they are controlled methodologically.

Why public?

The problem is not only mine.

HistoriaMP could have remained a private tool. But many people are interested in historical documents and cannot reach them: old letters, family sources, archival material, church documents, chronicles or digitized manuscripts.

Not everyone has palaeographic training. Not everyone can train an HTR system. Not everyone can build a technical pipeline. Even so, the interest can be real.

That is why HistoriaMP is intended in the long term to become accessible as a portal - not as a quick AI toy, but as a serious path to the source.

The simple centre

I wanted to know what is written in a source.

That question became a project. The project became a pipeline. The pipeline is intended to become a portal. And the portal may become access for many people who do not merely want to admire historical documents, but want to understand them.

Not as a shortcut past scholarship. But as a path to the source - with scholarly standards, visible uncertainty and verifiable readings.

View pipeline