📄

Drop PDF Here or Click to Upload

Maximum file size: 50MB

What Gets Extracted Automatically

🔍

Bibliographic Metadata

Document title
Author names
Publication year
Abstract text
Keywords
Page count

🔗

Identifiers & Links

DOI (Digital Object Identifier)
URLs from references
Citation identifiers
ISBN numbers
ArXiv IDs
PubMed IDs

📚

Citations & References

Full bibliography list
Citation parsing
Reference year detection
DOI extraction per citation
Structured JSON export
Raw text preservation

Multiple Export Formats

Transform your PDF data into any format you need - all processed locally in your browser

📊

JSON

Structured data with all metadata & citations

📄

Plain Text

Clean UTF-8 text file (.txt)

📝

Word

Formatted DOCX document

📋

Markdown

GitHub-flavored Markdown (.md)

📚

BibTeX

Citation format for LaTeX (.bib)

Why Choose Browser-First PDF Tools?

🔒

100% Private

Your PDFs never leave your computer. All processing happens locally in your browser.

⚡

Lightning Fast

No server round-trips. Instant processing with modern browser APIs and web workers.

📡

Works Offline

Install as a PWA and process PDFs anywhere, even without internet connection.

🆓

Free to Use

No sign-up, no API limits, no upload quotas. Built on the open-source PDF.js and pdf-lib libraries.

Learn more about PDF extraction

Practical guides on how PDF metadata is structured, how citation parsers actually work, and how to turn extracted records into a clean bibliography.

🧭

PDF metadata extraction

Where metadata lives inside a PDF, how heuristic extractors find titles and authors, and the failure modes worth knowing about.

📚

Citation parsing

How a parser splits a references section, labels each field, and keeps its hands off the ambiguous bits.

🔗

DOIs and identifiers

The persistent identifiers that make a citation resolvable — DOI, ArXiv, PubMed, ORCID, ISBN, ISSN.

🖨️

Scanned PDFs and OCR

What to do when a PDF has no text layer: tools, accuracy expectations, and privacy considerations.

🧩

Browser-side processing

The architectural choices behind a privacy-first PDF extractor, from PDF.js to service workers.

🗂️

Export formats

Reference pages for BibTeX, RIS, CSL-JSON, Markdown, and DOCX, with notes on when to pick each.

GROBID Tools