Disclaimer

Last reviewed on April 24, 2026

The site offers a tool and a set of guides about extracting structured information from PDF documents. This page describes the limits readers should keep in mind when relying on either.

Extraction is heuristic

The extractor reads the text layer of a PDF and applies pattern-matching to find titles, authors, identifiers, and citations. It will sometimes:

Pick a running header or a journal name as the title.
Mistake an author's affiliation, or a co-author of a cited paper, for an author of the current document.
Miss a DOI that is rendered as a hyphenated string broken across two lines.
Split a citation list incorrectly when entries are not visually separated.
Produce no output at all from a scanned PDF that has no text layer (in which case OCR is needed first).

The output should be reviewed before it is added to a reference manager or quoted in writing. The tool is an aid for working faster, not a substitute for human editorial judgement on records that need to be accurate.

Guides are general information

The articles published here describe formats, identifiers, and tooling in general terms. They are not personalised advice. Decisions about how to manage a particular publication, dataset, or institution's records should be made with the relevant professional input — a librarian, an information officer, a thesis adviser — not by treating a public guide as the final word.

External links

The site links to external standards bodies, vendor documentation, and reference projects. Those resources are linked because they were judged useful at review time. The site has no control over their contents and does not endorse everything they may publish in future. Broken or outdated links can be reported via the contact page.

No professional advice

Nothing on the site constitutes legal, academic, medical, or any other kind of professional advice. The privacy and terms pages describe legal commitments the site itself makes; they are not legal advice for visitors.

Liability

Use of the tool and the site is at your own risk. The fuller liability terms appear on the terms of use page.