Markdown export
The Markdown export turns an extracted PDF into a single .md file with sensible headings and paragraph breaks. It is intended for cases where the body text matters as much as the citation record — for example, when you want to take the paper into a notebook, a wiki, or a static-site generator and annotate it.
What the file contains
From top to bottom:
- An
H1with the extracted title. - A line of italicised authors, if any were detected.
- A "Metadata" section listing DOI, year, page count, and any identifiers that were found.
- An "Abstract" section, if one was extracted.
- A "Full text" section with the body of the PDF, broken into paragraphs.
- A "References" section listing the parsed citations.
Heading levels are GitHub-flavoured Markdown defaults: # for the title, ## for the major sections.
What it does not contain
The export deliberately does not try to reconstruct the original PDF's typography. It does not preserve column layout, figure placement, or table formatting. Tables, in particular, will appear as a wall of cell text rather than as a Markdown table; the heuristics needed to recover a usable table from arbitrary PDF text fragments are out of scope for the in-browser tool.
Inline citations within the body text are preserved as plain text — for example, "[12]" or "(Smith, 2023)" — but they are not linked to entries in the References section. If you need that, post-process the file with a small script that matches each citation marker to its entry.
Useful pipelines
- Notebooks. Drop the
.mdinto Obsidian, Logseq, or a Jupyter Notebook to read and annotate the paper alongside your own notes. - Static-site generators. Hugo, Jekyll, Eleventy, MkDocs, and Astro all read Markdown directly. The headings produced by the export sit naturally inside their default templates.
- Pandoc. Pipe through Pandoc to produce HTML, EPUB, DOCX, or PDF:
pandoc paper.md -o paper.html. With a citation processor and a.bibfile, Pandoc can also resolve the inline citation markers into full references. - Diffing. Markdown diffs cleanly in version control. If you are tracking the evolution of a paper across versions, exporting each version to Markdown and committing them to a repository gives you readable diffs that side-by-side PDFs do not.
Tips for cleaner output
- If the extractor missed the title and used a header instead, edit the
H1by hand before processing further. Most downstream tools use that line as the document's primary identifier. - Strip running page headers and footers with a quick search-and-replace if you see them repeated through the body.
- For long papers, split the export at the
H2boundaries to give each section its own file; this is friendlier to wikis and static-site generators that prefer one page per concept.
When Markdown is the wrong choice
If the goal is to file the citation in a reference manager, use BibTeX, RIS, or CSL-JSON instead. Markdown carries the human-readable representation but loses the field-level structure that a reference manager needs. The two outputs complement each other: the structured export feeds your bibliography, the Markdown export feeds your reading and annotation workflow.