CSL-JSON as an interchange format

Last reviewed on April 24, 2026

CSL-JSON is the input format used by Citation Style Language processors. It is a JSON document — strictly, an array of items — that any CSL-compliant engine can render into a formatted citation according to a chosen style. It is also the most structured of the formats produced by the extractor and the easiest one to consume programmatically.

Anatomy of an item

[
  {
    "id": "smith2023something",
    "type": "article-journal",
    "title": "Something about pdf extraction",
    "author": [
      {"family": "Smith", "given": "Jane"},
      {"family": "Doe",   "given": "John"}
    ],
    "container-title": "Journal of Document Engineering",
    "volume": "12",
    "issue": "3",
    "page": "101-120",
    "issued": {"date-parts": [[2023, 5, 14]]},
    "DOI": "10.1234/jde.2023.0012"
  }
]

Each object has an id (the equivalent of a BibTeX citation key), a type drawn from a fixed list, and a set of variables. The schema is published by the CSL project and is stable enough that records produced today will continue to render correctly with future style files.

Item types

CSL has a richer type vocabulary than BibTeX. The frequently used members:

Names

Names are objects with family and given, and optional suffix, non-dropping-particle, and dropping-particle fields. For institutions, use a single-string literal form to stop a CSL processor from trying to split it into a personal name. Two examples:

{"family": "van der Waals", "given": "Johannes"}
{"literal": "World Health Organization"}

Dates

Dates are objects too. The date-parts array can hold a single date — [[2023, 5, 14]] — or a range with two arrays. Year-only and year-month dates are valid. There is also a literal form for dates that cannot be parsed into a year, month, day structure: {"literal": "Spring 2023"}.

Why CSL-JSON tends to win for interchange

Where it is less natural

Validating output

The CSL project publishes a JSON Schema for CSL-JSON. Running the extractor's output through a validator is a cheap way to catch format-level mistakes before they reach a citation processor. Most failures come from putting a value in the wrong shape — a string where an array of name objects is expected, for instance — and the schema flags those clearly.