Method and technical implementation
The source remains the measure. The text is a hypothesis. The evidence lies in the image.
From image finding to verifiable reading
The page leads from the methodological principle through analysis levels and technical implementation to artifacts, AI limitation and quality control.
The source remains the measure
HistoriaMP does not treat historical manuscripts as images from which text should be extracted as quickly as possible. The text is not the beginning of the analysis, but its late result.
At the beginning stands the visible source: the manuscript image, its surface, its zones, its line structure, its character forms, its damage, its gaps and its uncertainties.
A reading is not simply asserted in HistoriaMP. It has to be justified.
The text is a hypothesis. The source is the evidence.
HistoriaMP therefore does not first ask: What is written there? It first asks: What is actually visible in the image? Only after that come analysis, transcription, variant checking, interpretation and communication.
Why this separation is necessary
Historical manuscripts are not normal text images. They contain damaged signs, ambiguous stroke groups, abbreviations, later additions, marginal signs, changes of colour, corrections, material traces and areas whose function cannot immediately be determined.
Classical OCR or HTR systems often work toward a smooth result: image → text.
HistoriaMP works differently: image → finding → structure → segment → line → glyph → minim cluster → abbreviation → reading → consensus → quality control.
The goal is not the fastest transcription. The goal is a reading whose emergence remains verifiable. A smooth answer can be scientifically problematic if it is no longer visible which place was read securely, which place was uncertain and which decision rests on interpretation.
Uncertainty is not an error. It is a result.
From finding to reading
HistoriaMP separates work on a manuscript into five levels. These levels must not be mixed.
01 Observation
At this level, only what is visible in the image is described: writing areas, line structures, margin areas, colour differences, damage, isolated character forms, dense stroke groups, weak areas and possible segmentation risks.
At this level, nothing is read. A red form is initially only a red form. A larger character area is initially only a larger visible form area. A marginal sign is initially only an isolated visible marginal finding.
02 Analysis
In analysis, visible structures are ordered: areas, zones, line courses, forms near the margin, possible boundary risks and critical sign areas.
Here too: analysis is not yet transcription. A zone is not automatically main text, a colour-different area is not automatically a rubric, and a larger form is not automatically an initial.
03 Transcription
Only when a visible finding has been sufficiently documented can a reading be proposed. It remains bound to image crop, coordinates, glyph findings, recognized uncertainties, variants and earlier analysis artifacts.
A diplomatic transcription should reproduce the finding as close to the source as possible. If a sign is unclear, the uncertainty must remain visible. If a minim cluster permits several readings, it must not automatically become a smooth word.
04 Interpretation
Interpretation begins only after the source-bound reading. It asks about meaning, linguistic classification, variants, open places and editorial decisions.
This level must not retroactively change the finding. A linguistically plausible addition is not visible proof; a historical expectation does not replace the source.
05 Communication
The last level is understandable presentation for present-day readers. Here a text can be explained, translated or contextualized. This presentation may be more readable than the diplomatic transcription, but it must remain recognizable as presentation.
- Finding-oriented analysis: What is visible?
- Diplomatic or critical reading: What can be read from the finding?
- Readable explanation or translation: How can the content be understood today?
Technical implementation
HistoriaMP is designed as a modular research environment. Its technical implementation does not follow the pattern of a classical OCR or HTR system in which an image is directly converted into text.
A model does not simply generate text. A pipeline creates traceable analysis artifacts from which a justified reading can later emerge.
Infrastructure and run system
HistoriaMP distinguishes between infrastructure and scholarly analysis pipeline. The infrastructure handles image intake, file checking, storage, hash generation, run creation, segmentation, coordinate management and artifact storage.
It does not decide what is written in a manuscript: it does not read, transcribe or interpret. Its task is to provide a stable, reproducible and checkable working basis.
A run documents the concrete analysis process. It can contain source image, technical image data, hashes, segments, coordinates, model inputs, module results, intermediate artifacts, checking reports and later reading or variant states.
This is intended to keep traceable which image was used, which image version was analyzed, which module produced which statement and which uncertainty was marked.
Image integrity and coordinate spaces
Before any content analysis comes the technical checking of the image source. A digital image is not automatically a neutral source, because upload, scaling, compression, format conversion, cropping, padding, segmentation or model preprocessing can lie between manuscript and model input.
For that reason, the image version itself becomes an artifact. HistoriaMP should record file format, image dimensions, file size, hash, colour space, metadata, derived image versions, segments, crops, model input versions and the transformation chain between source image and analysis image.
A model statement initially applies only to the concrete image version that the model actually received. It is not automatically a finding on the original image.
No coordinate without a coordinate space.
A marking at an x and y position is meaningful only if it is clear which image it refers to: source image, segment, crop, model input or percentage coordinates relative to a specific image version.
If an image has been scaled, cropped or padded, a new image space is created. Every finding must therefore state which image artifact it belongs to, what dimensions that artifact had, how it was derived and whether a return to the source image is possible.
Conservative segmentation
HistoriaMP avoids deciding at the outset which image areas are important or unimportant. Even apparently empty areas can contain weak traces, erasures, bleed-through, marginal signs, later additions, stains, material changes or damaged forms.
Technical segmentation should prevent information loss. Segments are generated so that they cover the image area completely. Overlapping segments help prevent sign forms from being cut off at segment boundaries.
Segmentation protects the finding. It does not decide it.
The segment data contain coordinates, dimensions and references to the respective image position. This allows every later module finding to be traced back to a concrete crop.
Modules and artifacts
HistoriaMP works with specialized analysis modules rather than with one universal model. In simplified form, the pipeline can be described as:
Image integrity → Layout → Segment → Line → Glyph → Minim cluster → Abbreviation → Transcription → Consensus → Quality control
Each module has a limited task. An early source module may only describe visible image features; later modules can process layout, character forms, stroke clusters, abbreviation findings, transcription proposals or quality checks.
If a language or vision model is used, its statement must be bound to a concrete input artifact: image or segment, input dimensions, possible scaling or compression, hash, source image and coordinate space.
The artifact principle is the core of the method. Stored are not only results, but also image and input data, segment coordinates, layout findings, line structures, glyph findings, minim clusters, abbreviation findings, variant lists, uncertainty reports, transcription proposals and quality checks.
A later user should not only see: This text stands here. They should also see: Why was this place read this way? Which visible finding supports the reading? Where is it secure? Where does it remain open?
Glyphs, minim clusters and abbreviations
A central difficulty of historical manuscripts lies at the level of character forms. In early analysis, a glyph is not yet a secure letter. A stroke cluster is not yet a word. An abbreviation is not yet an expanded reading.
Especially critical are minim clusters, signs above or below the line, isolated dots, abbreviation strokes, ligatures, damaged glyphs, closely connected forms, colour changes, signs near the margin and unclear additional forms.
HistoriaMP can mark such places as visual finding artifacts before a reading is generated. This level does not ask: What is written there? It asks: Where in the image is there a visual form that could become critical for a later reading?
Such findings can be stored with coordinates, segment reference, image hash and uncertainty status. Only after that can it be checked whether the place is readable, whether several readings are possible or whether it must remain open.
Candidates are not readings
For special signs, abbreviations or historical glyph forms, reference spaces such as Unicode, MUFI or internal glyph registers may become important. They are not automatic decisions.
A candidate is not a reading. A code point is not proof. A font character is not a manuscript finding.
The technical sequence remains: visible finding → internal glyph description → candidate space → checking → decision → transcription. The final decision requires a visible basis and, where necessary, human review.
AI under control
AI models can describe historical manuscript images, recognize structures, make transcription proposals and explain texts. For a primary source, this strength is also a risk: a model completes patterns, orders unclear material, formulates plausibly and generates fluent answers.
In historical manuscripts this can produce false precision: an unclear stroke becomes a letter, a damaged area becomes a word, a minim cluster becomes a smooth reading or a gap is closed by linguistic expectation.
HistoriaMP therefore deliberately limits AI. Early modules must not read, transcribe, claim function, interpret meaning or complete damaged places. They should observe, describe and mark uncertainty.
The prompt is not an end in itself. It is a methodological brake: it defines what a model should do, but also what it must not yet claim in this step.
A prompt alone is not a scholarly method. Method emerges only through clear module roles, suitable model choice, documented image versions, versioned instructions, controlled output formats, stored artifacts, validators, human review and quality control.
For HistoriaMP, the best model is not always the one that recognizes the most. Often the better model is the one that can reliably say: The evidence is not sufficient here.
What the method is meant to prevent
HistoriaMP is a protection system against false precision. It is meant to prevent smooth text from emerging without sufficient visible evidence, uncertain places from being silently smoothed or colour, size and position from being interpreted too quickly as function.
It should also prevent models from adding what is linguistically, historically or editorially expected, small marginal signs or weak traces from being lost through segmentation, findings from being transferred into the wrong image spaces or statements about reduced model images from being accepted untested as statements about the original image.
Quality control and human review
At the end of the pipeline stands not only the output of a reading, but a comparison of the analysis levels.
Quality control can check whether a critical glyph place was taken into account in the later text, whether a visible abbreviation form was silently expanded, whether an uncertain minim cluster was read too confidently, whether coordinates and model input are documented and whether uncertainties were correctly carried forward.
If a finding is not sufficiently secured, the reading can be marked, deferred or blocked. Not every place has to be decided. Sometimes the cleanest scholarly answer is: The evidence is not sufficient for a secure reading.
HistoriaMP is not conceived as a fully automatic truth machine. The pipeline can mark findings, propose readings, compare variants and make risks visible. Scholarly evaluation, however, remains a reviewing task.
The technical implementation therefore does not serve to replace source-critical work, but to structure it.
Three presentation levels for users
HistoriaMP should be understandable both for scholarly work and for interested users. The platform therefore needs several presentation levels.
- Finding level: shows image crop, coordinates, segment, glyph finding, uncertainty and variants. It remains strictly bound to the visible finding.
- Reading level: shows a diplomatic or critical reading. It can make abbreviations, variants and uncertainties visible, but remains bound to the analysis.
- Explanation level: communicates the content for present-day readers. It can translate, explain and contextualize, but it must not replace the finding level.
Summary
HistoriaMP implements its method through a modular, artifact-based pipeline. Image versions are documented, coordinates remain traceable, segments are generated conservatively, models work within defined roles, glyphic risks are marked and readings remain bound to visible evidence.
This does not produce a classical OCR output, but a documented chain from digital image finding to justified reading. Where the finding is clear, a reading can be justified. Where the finding is uncertain, uncertainty remains visible. Where the source does not provide enough evidence, the system must not claim more.
HistoriaMP does not generate historical truth.
HistoriaMP documents the path from visible finding to verifiable reading.
The source remains the measure. The text is a hypothesis. The evidence lies in the image.
