Limits & open problems

HistoriaMP does not pretend that AI can do everything.

This page will specifically document the problems that are real in LLM-assisted manuscript analysis - and how HistoriaMP deals with them.

Why this page?

The problems are part of the method.

HistoriaMP is not based on the belief that LLMs can simply read historical manuscripts reliably. On the contrary: many central architecture decisions arose because these systems have real limits.

Anyone familiar with the subject should be able to see here: the problems have not been overlooked. They are the reason why the pipeline is built so carefully.

Digital preprocessing is never neutral

The Xerox case shows why the path of origin matters.

In 2013, David Kriesel showed with Xerox scan copiers that digital documents can look clean and still be wrong: under certain conditions, numbers and image fragments were swapped in scans. The decisive methodological point is this: the error did not begin with OCR, but in the image data itself.

For HistoriaMP this is an important reminder. With manuscripts, too, a digital output must not be confused with the source. Scans, compression, segmentation, OCR, HTR and AI outputs are processing steps. They can help, but they can also change, conceal or create apparent certainty.

That is why HistoriaMP documents not only results, but also the path toward them.

Being developed

Topics documented here

Image resolution & downsampling

Why internal image reduction is problematic for primary sources.

Glyphs & signs

Why a visible stroke is not automatically a modern letter.

Minim clusters

Why dense groups of strokes are especially vulnerable to false precision.

Abbreviations

Why visible abbreviation signs and editorial expansion must remain separate.

Unicode & character encoding

Why not every historical sign can be represented stably as a modern character set.

Phantom readings

Why plausible readings that are not supported visually must be blocked.

Provisional position

The answer is not trust, but control.

HistoriaMP addresses these problems through segmentation, visual-basis references, uncertainty marking, separate artifacts, review steps, validators and quality control.

This page will later describe in detail which obstacles currently exist, which solution approaches are being tested and where certainty is deliberately not claimed.