Glossary

Terms of source-bound manuscript analysis.

This glossary explains central terms from HistoriaMP, digital palaeography, AI-assisted image analysis, OCR/HTR, MUFI, Unicode, glyph analysis, image integrity and artifact-based pipeline methodology.

Why this glossary?

The terms separate image finding, technical encoding and reading.

HistoriaMP does not treat historical manuscripts as a pure OCR task. The terms on this page describe a pipeline in which visible evidence, segmentation, glyph findings, uncertainty, model input and later reading are documented separately.

158 terms from project methodology, pipeline, image analysis and Digital Humanities.

Basics & Digital Humanities Image, Coordinates & Segmentation Glyphs, Signs & Reading Evidence, Uncertainty & Edition Modules & Pipeline Infrastructure & Workbench

All terms are shown.

Basics & Digital Humanities

Derivation chain

The documented sequence of all image versions from the source image through crops, segments, scaling or model inputs. It shows which concrete image version a finding rests on.

Bounding Box

A rectangular coordinate area that marks a zone, segment or finding in the image.

Codex Memory

A planned knowledge system that can store recurring glyph variants, abbreviation forms, scribe profiles, layout patterns and typical readings.

Debugging capability

The system property that makes later error tracing possible because runs, artifacts, prompts, segments and results remain stored.

Digital Humanities

A research field that connects digital methods with humanities questions. HistoriaMP positions itself in this field as a source-bound manuscript analysis platform.

Edge Risk

A visible risk at image edges, such as cut-off sign forms, markings near margins or unclear edge areas that must be protected for later analysis.

Finding

A single technical or visual finding, such as a conspicuous small form above a line, a stroke near the edge or a dense minim cluster.

HTR

Short for Handwritten Text Recognition. Classical HTR systems often aim at image to text. HistoriaMP instead uses image to finding to structure to reading.

AI-assisted analysis

The use of AI models within a controlled pipeline. AI does not provide unsupported truth, but verifiable analytical contributions.

Layout data

Structural information about the page, such as text zones, image areas, line spaces, margins, segment boundaries or visible groupings.

Readable version

A more understandable text version for general users. It is explicitly an interpretive layer and not identical with the diplomatic transcription.

LMM

Large Multimodal Model. A model that can process image and text inputs. In HistoriaMP an LMM may analyze, but must not replace the first finding layer.

Marginalia

Signs, notes or markings near the margin. In HistoriaMP they are first treated as visible margin findings before a function is assigned.

Multi-layer text model

The separation between diplomatic transcription, critical reading and readable version. Each layer has a different function and must not be mixed.

Neutral Data Preparation Layer

The server's neutral technical data preparation. It stores, segments and manages data, but does not interpret manuscript content.

Palaeography

The scholarly study of historical writing forms. HistoriaMP touches palaeographic questions, but strictly separates visible finding from later classification.

Phantom Guard Rule

The rule that no words or signs may be reconstructed if no visible basis for them exists in the image.

Result Aggregation

The merging of results from multiple segments, modules or model runs.

Rubricator Analysis

A planned analysis module for entries that differ in color or form and may relate to a rubricator hand. Such an assignment still has to be evidenced.

Scientific Analysis Layer

The scholarly analysis layer of HistoriaMP. It is separated from the technical infrastructure.

Structured analysis

The decomposition of a manuscript page into verifiable layers such as layout, segment, glyph, minim, abbreviation, reading and quality control.

Unsupported Expansion

An expanded abbreviation or supplemented reading without sufficient visual basis.

Variant analysis

The comparison of competing readings, manuscript findings or transcription proposals.

Image, Coordinates & Segmentation

Image finding

Everything that can be visibly observed in the concrete image: surface, sign forms, spacing, color differences, damage, margin traces or conspicuous markings.

Image derivative

A derived image version, for example a scaled image, crop, segment, compressed image or input prepared for a model.

Image integrity

The technical check whether an image file is complete, clearly registered and robust enough for a particular analysis.

Image source

The concrete image file or image version to which a finding refers. In HistoriaMP it must be clear whether a finding arose on the original image, on a segment or on a model input.

Image validation

The technical check of an uploaded image file, such as file type, image size, readability, pixel count and processability.

BBox Percent

A bounding box in percentage coordinates. It describes a position relative to the respective image area, not as an absolute pixel position.

Coordinate Integrity Layer

A checking layer that ensures every coordinate is bound to a defined image space and can be correctly traced back to original, segment or model input.

Coordinate Integrity Rule

The rule that no coordinate may be used without stating its coordinate space. A coordinate is valid only within a specific image version.

Crop

A controlled image excerpt. Crops can be used for detail checks, but must be documented as their own image artifacts.

Grid Preview

A visual preview of the image with an overlaid grid. It helps check image areas and tiles in the workbench.

Grid System

A grid-based approach to image division. In HistoriaMP it is a debugging and viewer tool, not the actual scholarly segmentation decision.

Image Cache

A technical temporary storage of image data to avoid repeated loading of large files.

Image Registry

An image registry that stores hash, run ID, filename and timestamp. It helps recognize identical images and keep analysis histories traceable.

Input Fidelity

The degree to which a model input still corresponds to the registered source file. Scaling, compression or cropping can change input fidelity.

Layout-based segmentation

A segmentation that follows visible structures of the source, not only a technical grid.

Manuscript image

The digital image version of a historical manuscript. For HistoriaMP the decisive point is which concrete image version was analyzed.

Mapping to Original

The tracing of a finding from segment, crop or model input back to its position in the registered source image.

Material zone

A visible area of the source that mainly shows surface, damage, stains, margins or empty spaces. It is not automatically excluded as meaningless.

Model Input Manifest

A record of which image version a model actually received: hash, dimensions, crop, scaling, compression and coordinate space.

Model input

The concrete image or text version actually passed to an AI model. It is not automatically identical with the original source.

Model input artifact

The stored version of a model input including technical metadata. It makes later model claims verifiable.

Overlap

The deliberate overlap of neighboring segments. It prevents signs or structures from being cut off at segment boundaries.

Overlap segmentation

A segmentation strategy in which image segments overlap. Critical image areas therefore appear in several context windows.

Segment

A controlled image area that is stored for analysis purposes and can later be traced back to its original position.

Segment Cache

A planned technical cache of segments for more efficient processing.

Segment Metadata

Metadata for a segment, such as ID, file, x/y position, width and height. They enable mapping back to the original image.

Segment Queue

A planned queue for systematic processing of individual segments by modules.

Segment reference

Information about which segment a finding belongs to and how that segment is positioned in the source image.

Segmentation

The division of an image into analyzable areas. In HistoriaMP segmentation is methodologically critical because early losses can distort later readings.

Tile

A rectangular grid excerpt of an image. Tiles serve technical orientation and the viewer, but are not necessarily the primary scholarly analysis basis.

Tile Explorer

A workbench tool with which individual grid excerpts can be inspected.

Zone

A visually distinguishable area in the image, such as a larger writing-like area, a margin area, a color-different area or a material zone.

Zone Detection

The detection of visible zones within a manuscript image. In HistoriaMP zones should not be functionally overinterpreted.

Glyphs, Signs & Reading

Abbreviation

An abbreviation is a historical shortened form in a manuscript. In HistoriaMP it is not silently expanded, but first treated as a visible finding and only then checked as a possible reading.

Justified reading

A reading that is not merely asserted, but can be traced back to concrete visual evidence, segments, glyph findings, variants and uncertainties.

Diplomatic transcription

A source-close transcription that does not silently smooth signs, abbreviations, uncertainties and visible special features.

Glyph

A visible sign form in the manuscript image. A glyph is first a form in the image and not automatically a modern letter or Unicode codepoint.

Glyph Evidence Comparator

A planned checking instance that compares technical glyph findings with MUFI/Unicode candidates, model outputs and transcription claims.

Glyph ID

An internal identifier for a documented glyph form. It separates the visible form from later readings, Unicode assignments or font renderings.

Glyph lens

An upstream visual control layer that marks critical glyphic anomalies before a model turns them into text.

Critical reading

A prepared reading in which abbreviations and editorial decisions are made visible. It stands between diplomatic transcription and readable version.

Reading

A text hypothesis derived from findings. In HistoriaMP a reading must be traceable to visible evidence and documented uncertainty.

Ligature

A connected or fused sign form. Ligatures can lead to misinterpretations if they are resolved into separate letters too early.

Minim

A short vertical stroke in historical writing forms. Several minims can form clusters that are difficult to distinguish.

Minim Cluster Rule

The rule that minim clusters must not be automatically interpreted or supplemented through linguistic plausibility.

Minim cluster

A dense group of similar stroke forms where several readings may be possible. Minim clusters are among the central error sources in historical transcription.

MUFI

The Medieval Unicode Font Initiative. For HistoriaMP, MUFI is a reference and encoding space, but not proof of a reading.

MUFI/Unicode candidate space

A list of possible sign or codepoint candidates after a documented glyph finding. Candidates are hints, not final decisions.

MUFI lens

An automated visual finding instance that marks critical special forms and prepares possible encoding spaces without transcribing by itself.

Token

A later reading-near unit within the pipeline. A token may be stabilized only after its visual basis has been documented.

Token Boundary Rule

The rule that word or token boundaries may be assumed only where visible separations or sufficiently documented findings exist.

Transcription variant

A possible reading or transcription version that can coexist with other variants as long as the image finding does not force a clear decision.

Transparent transcription

A transcription that discloses its basis: image location, segment, glyph finding, uncertainty and alternative readings.

Evidence, Uncertainty & Edition

Analysis artifact

A stored intermediate result of the pipeline, for example segment data, glyph findings, variant lists or uncertainty reports. Analysis artifacts make the path to the reading traceable.

Artifact-based analysis

A method in which not only a finished text is produced, but every relevant intermediate step remains available as a verifiable artifact.

Artifact Browser

A planned interface with which stored analysis artifacts of a run can be searched, checked and compared.

Artifact system

The area of HistoriaMP in which all analysis results, metadata, segment information and checking findings are stored in structured form.

Auditability

The possibility of critically checking a reading later: which image location, segment, glyph finding and uncertainty led to this reading?

Basic Mode

A simplified usage mode for users who mainly need a readable output. Unlike Research Mode, it does not necessarily show all analysis layers in detail.

Finding

An observable property of the source or of an image segment. A finding is not yet interpretation and not a final reading.

Finding artifact

A stored visual or technical finding, such as a marked conspicuous glyph form with coordinates, segment reference and uncertainty status.

Finding layer

An analysis layer that documents visible properties before transcription or interpretation is derived from them.

Documented uncertainty

Uncertainty is not hidden, but explicitly marked. It is an analysis result and not a system error.

Editorial practice

Scholarly work on textual transmission in which readings, variants, interventions and decisions are documented traceably.

Evidence

The concrete basis of a statement. In HistoriaMP, evidence primarily means a visible, documented image finding.

Evidence comparison

The comparison between technical image finding, model claim, transcription proposal and later output.

Source

The authoritative basis of the analysis. In HistoriaMP the source is not the generated text, but the documented image basis.

Source-bound analysis

An analysis in which every claim must be traced back to the concrete source or a documented image version.

Research Mode

A detailed usage mode that makes the complete analysis pipeline, artifacts, variants and uncertainties visible.

Silent Normalization

Silent normalization occurs when an uncertain or special image finding is smoothed in the result without the uncertainty remaining visible.

Uncertainty report

An artifact that documents where and why the analysis is uncertain.

Uncertainty marker

A visible marker for uncertain readings or findings, for example `⟦...??⟧`.

Visual Basis Ref

A reference to the concrete visual basis of a reading, such as a segment, glyph finding or documented image area.

Visual Priority Rule

The rule that visible evidence has priority over linguistic, historical or statistical plausibility.

Scholarly traceability

The ability to check an analysis not only as a result, but as a documented path from source to reading.

Coordinate space

The defined image space in which a coordinate is valid. After scaling, crop, padding, segmentation or model preprocessing, a new coordinate space is created.

Reading as hypothesis

The methodological principle that a reading is not treated as fact, but as a justified, verifiable proposal based on visible evidence.

False precision

The impression of a secure, smooth reading although the visible finding does not sufficiently support this certainty.

Phantom reading

A reading that seems plausible but is not sufficiently bound to visible evidence.

Silent smoothing

The unnoticed transformation of uncertain, damaged or ambiguous places into apparently secure forms or words.

Functional interpretation

The assignment of a function such as rubric, initial, comment or correction. In HistoriaMP it must not be derived from color, size or position alone.

Context reconstruction

A supplement based on linguistic, historical or editorial expectation. It must not replace visible evidence.

Model input confusion

The error of treating a statement about a reduced, scaled or otherwise altered model image as an unchecked statement about the source image.

Human review

The expert checking of findings, variants, uncertainties and readings by a reviewing person. It remains part of the method.

Candidate space

A controlled space of possible signs, codepoints, glyph forms or abbreviation interpretations. A candidate is not yet a reading.

Methodological brake

A deliberately limiting rule or prompt structure that prevents a model from reading, interpreting or smoothing uncertainty too early.

Finding layer view

The display layer on which image excerpt, coordinates, segments, glyph findings, variants and uncertainties remain visible close to the source.

Reading layer

The display layer of a diplomatic or critical reading that remains bound to finding artifacts and uncertainties.

Explanation layer

The mediating display layer for present-day readers. It may explain and translate, but must not replace the finding layer.

Validator

A checking mechanism that controls schema, vocabulary, coordinate reference, artifact references or scholarly risks.

Hash value

A technical check value for a file or image version. It helps uniquely recognize inputs and document analysis paths.

Modules & Pipeline

Abbreviation Analyzer

An analysis module for abbreviation forms. It should examine visible signs, abbreviation strokes or additional forms without prematurely converting them into modern expanded forms.

Analysis pipeline

The stepwise processing of a manuscript source from image checking through layout, segments, glyphs, minim clusters and abbreviations to justified reading and quality control.

Consensus Engine

A module that compares several findings or reading proposals. The goal is not majority at any price, but a justified decision with documented uncertainty.

Glyph Analyzer

A module for examining individual visible sign forms. It should record graphic features before a reading is derived from them.

Glyph Fingerprint Engine

A planned module for analyzing recurring glyph forms. In the long term it can help identify scribe profiles or formal patterns within a codex.

Graphic Segmentation Engine

A planned module for separating different visual areas, such as text, illustration, ornament, margin area or other graphic structures.

Grid Engine

A technical system that divides an image into rectangular grid areas. It mainly serves orientation, visualization and technical control.

Image Integrity & Input Fidelity Analyzer

An upstream module that documents image files, hashes, dimensions, formats, metadata, derivatives and model-input versions. It does not read or interpret text.

Image Normalization Engine

A planned module for controlled image preparation, such as rotation, contrast, perspective or other technical corrections.

Layout Analyzer

A module for examining page structure: visible areas, line arrangement, text zones, margin areas and structural separations.

Line Structure Analyzer

A module for analyzing line structures, line courses, spacing, interruptions and problematic transitions.

M00

The upstream module for image integrity, input fidelity and coordinate integrity. It checks the technical robustness of the image basis.

M01 Source Analyzer

The first analysis module of the pipeline. It describes only visible properties of the source and does not generate transcription.

M02 Layout Analyzer

A module for analyzing the visible page and layout structure.

M03 Segment Engine

A module or system area for controlled division into relevant analysis areas, such as line, word or glyph areas.

M04 Glyph Analyzer

A module for analyzing individual visible glyph forms.

M04A Minim Analyzer

A module for examining minim structures and dense stroke groups.

M04B Abbreviation Analyzer

A module for analyzing visible abbreviation forms and possible abbreviations.

M04X Glyph Anomaly & MUFI Candidate Lens

A planned visual lens for conspicuous glyph areas and possible MUFI/Unicode candidates. It generates findings, not finished readings.

M05 Transcription Engine

A module for creating a diplomatic or source-bound transcription on the basis of documented findings.

M06 Consensus Engine

A module for comparing competing readings and findings.

M07 Quality Control

A module for checking structure, consistency, uncertainties, visual basis and possible errors.

Minim Analyzer

A module for analyzing minim structures. It should prevent dense stroke groups from being reconstructed too quickly into secure words.

Module

A specialized pipeline step with a clearly limited task, such as image checking, layout analysis, glyph analysis, transcription or quality control.

Module orchestration

The coordinated execution of several modules in a defined pipeline.

Pipeline bias

A distortion that arises when early technical or interpretive assumptions influence later results. HistoriaMP tries to reduce this through separated modules and artifacts.

Quality Control Engine

A checking module for consistency, visual basis, uncertainties, error sources and possible impermissible smoothing.

Segment Engine

The system for creating image segments. In HistoriaMP it serves complete image coverage and protection against information loss.

Source Analyzer

The module that checks the source for visible properties without reading text or interpreting meaning.

Text Region Detection Engine

A planned module for detecting different text areas or visual zones within a manuscript page.

Transcription Engine

A module for creating a transcription. In HistoriaMP it must not silently smooth uncertain places.

Variant Analysis Engine

A planned module for comparing multiple manuscripts, readings or transmission variants.

Infrastructure & Workbench

FastAPI

The Python web framework on which HistoriaMP's server infrastructure is based.

Infrastructure

The technical layer of HistoriaMP: upload, storage, validation, segmentation, artifact management, API and module orchestration. It does not interpret manuscript content.

OpenCV

An image-processing library that can be used in HistoriaMP for technical tasks such as segmentation, edge analysis or image operations.

Prompt Library

A library of versioned module prompts. It makes traceable which instruction a module worked with.

Run

An isolated analysis execution. Each upload creates its own run with its own directory structure, image data, segments, module results and logs.

RUN_ID

The unique identifier of an analysis run. It connects image, segments, artifacts and module results.

Run isolation

The principle that each analysis is stored completely in its own run directory. This keeps analyses reproducible and separated from each other.

Run Locking

A technical safeguard intended to prevent parallel write access to the same run.

Server infrastructure

The technical basis of HistoriaMP: FastAPI server, upload, storage, runs, segmentation, APIs and artifact management.

Trace System

A tracking system that uses run ID and trace run ID to make analysis paths, module steps and results reproducible.

TRACE_RUN_ID

An identifier for tracing individual pipeline steps or execution paths within a run.

Web Workbench

The working interface of HistoriaMP. It serves upload, image display, grid and segment view, module control and result review.