Bibliographic Metadata
- Document title
- Author names
- Publication year
- Abstract text
- Keywords
- Page count
Extract PDF Metadata & Citations in Your Browser
Maximum file size: 50MB
Transform your PDF data into any format you need - all processed locally in your browser
Structured data with all metadata & citations
Clean UTF-8 text file (.txt)
Formatted DOCX document
GitHub-flavored Markdown (.md)
Citation format for LaTeX (.bib)
Your PDFs never leave your computer. All processing happens locally in your browser.
No server round-trips. Instant processing with modern browser APIs and web workers.
Install as a PWA and process PDFs anywhere, even without internet connection.
No sign-up, no API limits, no upload quotas. Built on the open-source PDF.js and pdf-lib libraries.
Practical guides on how PDF metadata is structured, how citation parsers actually work, and how to turn extracted records into a clean bibliography.
Where metadata lives inside a PDF, how heuristic extractors find titles and authors, and the failure modes worth knowing about.
How a parser splits a references section, labels each field, and keeps its hands off the ambiguous bits.
The persistent identifiers that make a citation resolvable — DOI, ArXiv, PubMed, ORCID, ISBN, ISSN.
What to do when a PDF has no text layer: tools, accuracy expectations, and privacy considerations.
The architectural choices behind a privacy-first PDF extractor, from PDF.js to service workers.
Reference pages for BibTeX, RIS, CSL-JSON, Markdown, and DOCX, with notes on when to pick each.
Last reviewed on April 24, 2026