← Back to blog overview

When the model input is not the source

Why image scaling in AI-assisted image analysis must be documented

An AI statement about an image initially applies only to the concrete the system actually processed. For detail-critical analysis, image scaling, crops and derived files must be documented.

Anyone who uploads an image to an AI system often assumes, without saying so, that the model sees exactly the file that was uploaded.

This assumption is understandable, but methodologically dangerous.

Between the original image file and the image actually processed by an AI model, several technical steps may intervene: upload, compression, resizing, format conversion, internal optimization or automatic reduction for model processing.

The model may therefore analyze not the original image, but a derived version of it.

For many everyday applications this is harmless. If the task is to recognize a building, a landscape, a product or a broad image scene, a reduced image version is often sufficient.

For detail-critical analysis, however, the difference is decisive.

Then the question is not only:

What does the AI recognize?

But before that:

Which image version did the AI actually see?

A concrete example

In one test, a high-resolution manuscript image was passed to an AI system. The starting file measured 2393 × 3434 pixels, about 8.22 megapixels.

The model, however, did not receive this full image size. It received roughly 1070 × 1536 pixels, about 1.64 megapixels.

The linear scaling factor was therefore about 0.447. Both axes were reduced to less than half their original length. In terms of area, only about one fifth of the original pixel count remained.

This does not mean that 80 percent of image meaning was automatically lost. It does mean that roughly 80 percent of the original sampling positions were no longer present in the processed image version.

For coarse image content this may be secondary. For fine lines, small gaps, weak contrasts, damaged structures, handwriting, abbreviation marks and tiny annotations, it is central.

Image scaling is not neutral

Reducing an image is not simply displaying the same image at a smaller size.

When a large image is reduced to a smaller version, several original pixels must be combined into fewer new pixels. This process is called resampling.

Resampling can change visible detail:

  • thin lines become weaker or disappear
  • small dots merge with the background
  • fine contrasts are smoothed
  • damaged structures look cleaner than they actually are
  • edges and transitions lose sharpness
  • small signs or markings become ambiguous
  • minimal tonal differences can disappear into noise

To the human eye, the reduced image may still look the same. Technically, however, it is no longer the same file. Methodologically, it is no longer the same visual evidence.

What large AI providers do technically

AI systems must transform image data into a form the model can process. They therefore use internal limits for resolution, patch count, token budget or maximum image dimensions.

For certain vision models, OpenAI uses a patch-based procedure in which images are divided into 32 × 32 pixel patches. Depending on model and detail level, limits apply to the number of patches and maximum image dimensions. If a limit is exceeded, the image is proportionally reduced.

Anthropic also documents image-size limits for Claude models. Claude Opus 4.7 supports a long image edge of up to 2576 pixels. Earlier or other Claude models use 1568 pixels as a limit for the long edge.

The central point is not that such systems are bad. On the contrary: such limits are technically understandable because image processing requires compute and token budget.

The decisive point is different: the default state is not automatically fidelity to the original. Without deliberate precautions, analysis may operate on a reduced image version.

Pixel loss is not identical with information loss

Precision is necessary here. If an image version contains only 20 percent of the original pixel count, this does not automatically mean that 80 percent of the relevant information is lost.

Low-frequency structures often remain well preserved:

  • page margins
  • columns
  • large image areas
  • layout
  • broad forms
  • general image composition
  • larger writing zones

The situation is different with high-frequency detail:

  • hairlines
  • fine writing
  • small dots
  • abbreviation signs
  • tildes
  • diacritical signs
  • fine corrections
  • erasures
  • thin connecting strokes
  • minimal tonal differences
  • tiny security features
  • damaged or faded structures

These details often occupy only a few pixels. If the image is reduced, they may disappear, be smoothed or be replaced by new artifacts.

When is a reduced image version sufficient?

A reduced image version may be entirely sufficient for broad image description, object recognition, layout analysis, page composition, color impression, visual orientation, sorting image material and selecting relevant areas.

Even for documents or manuscripts, a reduced overview can be useful. It can help identify columns, margins, larger damage, image zones or conspicuous areas. For these tasks, overview is more important than the last detail.

When does scaling become critical?

Scaling becomes critical whenever the smallest visual differences carry meaning. This applies to historical manuscripts, damaged documents, technical drawings, medical images, material analysis, microscopy, maps, seals, signatures, security features, fine printing details and forensic image analysis.

In such cases, automatic scaling can change the basis of analysis. A model may still give a plausible answer.

But the methodological question is:

What does this answer rest on?

On the original?

Or on a reduced, smoothed, compressed or otherwise changed copy?

This distinction determines whether an analysis is traceable, verifiable and citable.

The problem of false precision

The most dangerous thing is not that a model sees nothing in a reduced image. Often it still sees a great deal.

The danger is that the model can produce a very certain answer from an uncertain visual basis. This creates false precision.

A detail may be visible in the original. In the reduced version it may be blurred or no longer unambiguous.

The model still decides clearly.

The answer sounds precise.

But the visual basis was no longer precise.

For simple everyday applications this is usually not a problem. For scholarly, technical, medical, legal or archival applications it is a methodological error.

In those contexts, a plausible answer is not enough.

It must be traceable from which file, which resolution, which processing state and which concrete model input the statement arose.

The model input is not automatically the source

The central methodological statement is:

An AI statement about an image initially applies only to the concrete image version that the model actually received.

It does not automatically apply to the original.

Therefore, one has to distinguish between:

  1. original object
  2. digital master file
  3. stored source file
  4. edited working copy
  5. crop or segment
  6. upload version
  7. input actually processed by the model
  8. internally scaled or optimized image version

Only one of these levels is the immediate analysis input of the model. A model statement initially refers to that input. Everything else must be documented and justified.

Chain of custody: image provenance must remain traceable

The problem is not only technical. It is methodological. In archives, museums, libraries and scholarly editions, the distinction between original, reproduction and edited version is fundamental.

A reproduction must not be treated silently as the original. An edited version must not be cited as if it were the source itself. The same applies to AI analysis.

If an image was scaled, compressed, cropped or otherwise changed before analysis, this change must be documented. Not because AI becomes useless, but because only then is it clear what was actually analyzed.

What clean image analysis should document

Serious AI-assisted image analysis should store not only the result, but also the analysis input.

At minimum, the following metadata are useful:

  • file name
  • file format
  • width in pixels
  • height in pixels
  • pixel count
  • file size
  • SHA-256 hash
  • color space
  • ICC profile, if available
  • EXIF data, if available
  • source of the file
  • time of import
  • processing steps
  • scaling factor against the master
  • compression status
  • AI model used
  • API endpoint or upload route used
  • detail setting, if available
  • file or image version actually analyzed

The hash is especially important. A SHA-256 hash is a digital fingerprint of a file. If a single byte changes, the hash changes. This makes it possible to test whether two files are really identical or merely look similar.

A practical rule for sensitive image analysis

A simple rule is:

An AI statement about an image initially applies only to the concrete image file that the model actually received.

If the analyzed file is not identical with the registered original or master, it must be treated as a derived image version.

Possible status values would be for example:

  • original_or_master_input
  • derived_image
  • resized_input
  • cropped_input
  • compressed_input
  • unknown_fidelity_input

This is not excessive caution.

It is the basic condition for traceability.

The right division of labor: overview and detail

A useful workflow separates overview from detail. For overview, a reduced full-page image may be sufficient: where are relevant areas, columns, margins, conspicuous zones or damage?

For detail analysis, however, one should work with controlled crops from the best available source file: small image areas, high effective resolution, separate hash, documented origin, clear relation to the master and no uncontrolled reduction.

Instead of letting a large page be reduced uncontrollably, one generates targeted crops from the master. The model then receives less area but more relevant detail information.

Consequences for manuscripts and historical documents

For historical manuscripts, this distinction is especially important. A reduced overview can be useful for layout, page structure and orientation. For palaeographic detail, it is often insufficient.

Particularly affected are:

  • individual glyphs
  • minim clusters
  • abbreviation signs
  • nasal bars
  • superscript letters
  • small corrections
  • erasures
  • marginal notes
  • damaged or faded areas
  • diplomatic readings

A diplomatic reading should represent the concrete visible finding as precisely as possible. A fully reliable diplomatic reading therefore cannot arise from an uncontrolled reduced image version.

It can at most provide a provisional reading that still has to be verified.

The actual finding must be checked against the appropriate image input.

Example of a useful analysis matrix

A simple matrix can define which analysis level is allowed with which image quality:

Analysis level Derived full view Scaled model input Unknown image fidelity
Layout / page structureallowedallowedwith reservation
Columns / larger zonesallowedallowedwith reservation
Broad writing impressionallowedallowedwith reservation
Glyph analysisonly on high-resolution croponly on high-resolution cropblocked
Minim clustersonly on high-resolution croponly on high-resolution cropblocked
Abbreviation signsonly on high-resolution croponly on high-resolution cropblocked
Diplomatic readingnot sufficientnot sufficientblocked

This matrix does not prevent analysis.

It only prevents a finding from claiming more precision than the input provides.

Better workflow for AI-assisted image analysis

For sensitive image analyses, a controlled workflow is recommended:

  1. store the original or master file
  2. calculate the SHA-256 hash
  3. record technical metadata
  4. create working copies in a controlled way
  5. derive crops or segments deliberately from the master
  6. store and hash every derivative separately
  7. let the model work only with documented inputs
  8. always connect the result to the concrete input
  9. verify uncertain places on higher-resolution crops
  10. link analysis level to image quality

This keeps traceable which file was analyzed, whether it was the original, whether it was a crop, whether it was scaled, whether it was compressed, which image quality the model received, which statements are reliable and which statements must be blocked or verified.

Conclusion

Image scaling is not a side issue. It determines what an AI analysis is actually based on.

An AI model does not automatically see the original image. It sees the input that has been technically passed to it and internally processed.

If that input was scaled, compressed, cropped or otherwise changed, this must be documented. For broad analysis this may be unproblematic. For detail-critical tasks it is central.

The decisive question is therefore not only:

What does the AI recognize?

But before that:

Which image version did the AI actually see?

Only when this question has been answered can AI-assisted image analysis be traceable, verifiable and robust.

The model input is not automatically the source.

And exactly this distinction is the precondition for serious work with AI image analysis.

Frequently asked questions

Why is the model input not automatically the source?

Because an AI system often processes a technically prepared image version, for example a scaled, compressed or cropped version of the source file.

Project context

This article belongs to the methodological development of HistoriaMP. More on the project's position, limits and contact route is available on the project page.

About HistoriaMP · Contact

Why is image scaling critical for historical manuscripts?

Because small signs, abbreviation strokes, dots, erasures and fine contrasts can be smoothed or changed during scaling.

Project context

This article belongs to the methodological development of HistoriaMP. More on the project's position, limits and contact route is available on the project page.

About HistoriaMP · Contact

What should be documented?

At minimum: file name, image size, file format, hash, processing steps, model used and the image version actually analyzed.

Project context

This article belongs to the methodological development of HistoriaMP. More on the project's position, limits and contact route is available on the project page.

About HistoriaMP · Contact

Project context

This article belongs to the methodological development of HistoriaMP. More on the project's position, its limits and the contact route is available on the project page.

About HistoriaMP · Contact