The work has been done. Getting to it is the hard part.

Across universities, libraries, and cultural institutions around the world, an enormous amount of careful, scholarly work has gone into digitising humanity's original source texts. Ancient manuscripts, historical translations, sacred texts from every tradition. This is painstaking, funded, expert work and it deserves recognition.

Much of this material is in the public domain. In many cases the institutions themselves have explicitly released their digitised editions under CC0, the most permissive possible dedication, because they want these texts used freely. The scholarship is open. The intention is generosity.

The content is free. Getting to it programmatically is still harder than it should be.

These collections are built for scholarly reading and academic research. They are presented through library portals, encoded in TEI XML designed for digital humanities workflows, split across paginated HTML views. For the audience they were built for, that is exactly right. For a developer building a Bible app, a seminary student writing a study tool, a church adding a verse to their website, or an AI system that needs structured text, there is a significant gap between what exists and what is usable.

OriginsAPI exists to close that gap. Not by replicating the scholarly work that has already been done, but by building a consumption layer on top of it. Clean JSON. Simple REST endpoints. An embeddable widget that requires no technical knowledge. The same public domain texts, in the format that modern builders actually work with.

Raw text. No interpretation.

The decision to serve only raw, unmodified source text is deliberate and important. These texts carry deep meaning for billions of people across many traditions. Adding interpretive layers, harmonising translations, or making editorial choices about what a passage means is not something a data API should do. That work belongs to scholars, theologians, and communities who have the depth and context to do it responsibly.

OriginsAPI serves what the text says, in the translation or manuscript tradition you request, exactly as it was written. Nothing added. Nothing smoothed out. No modern paraphrase. No theological position embedded in the data.

On the paid tier, we offer AI-generated lexicons and cross-references as interpretive tools. These are clearly marked as such and come with explicit disclaimers. Linguistic analysis is inherently interpretive. We provide it as a starting point for study, not as a definitive scholarly resource. The difference between raw text and interpretive tooling is a line we take seriously.

Data quality is not an afterthought.

Public domain digitisation is painstaking work, but even the most careful OCR process leaves errors. When the source material is a manuscript printed in 1611, or a Hebrew codex, or a Greek New Testament, a misread character is not a typo. It can change the meaning of a word, a verse, or a theological argument.

To be direct about where things stand: the texts on OriginsAPI right now are live but not fully verified. This is a one-person project and verification across hundreds of thousands of records takes time. The process below is being applied progressively. Some texts have been reviewed in full, others partially, others not yet. That work is ongoing.

What we will never do is apply a correction quietly. Every change made to the database goes through the same process regardless of how obvious it seems, and every change is permanently logged.

  1. 1
    Source ingestion

    Public domain texts are ingested from established digitised sources, cleaned of HTML artifacts and formatting inconsistencies, and imported into the database in their original form.

  2. 2
    Multi-AI cross-verification

    Multiple AI systems independently review each text and flag potential digitisation errors: encoding issues, truncated verses, duplicate words, OCR substitutions, and character set anomalies. Systems are chosen for their strengths with specific languages and scripts.

  3. 3
    Human review and approval

    Every flagged item goes to human review before any correction is applied. Findings are verified against authoritative reference sources. Disagreements between AI systems are examined and resolved. No correction is ever applied without a human making the final call.

  4. 4
    Audited corrections

    Every correction is logged with the original value, the corrected value, the reference it applies to, and the reason for the change. The audit trail is permanent. Every edit ever made to the database has a full record of what changed, when, and why.

If you find an error in a text, please report it through our contact page. Reports go directly into the verification queue and are reviewed before any correction is applied.

Every culture. Every tradition. Every original text.

We started with the Bible because the public domain digitisation work for biblical texts is the most mature and the demand from developers is the highest. Twelve translations across English, Latin, Greek, and Hebrew are live now, with more being verified.

But the Bible is only the beginning. The vision for OriginsAPI is to become the definitive accessible source for humanity's original texts across every tradition. If it is a significant original source text, if it is in the public domain, and if good digitisation work exists or can be done, it belongs here.

Bible 12 translations live. English, Latin, Greek, Hebrew.
Septuagint (LXX) Greek Old Testament. In preparation.
Quran Arabic source and translations. Planned.
Vedas Sanskrit source and translations. Planned.
Mythology Greek, Roman, Norse, Egyptian. Planned.
Etymology Historical linguistic source texts. Planned.

Each category gets its own verification process before anything goes live. We would rather take the time to get it right than publish data we are not confident in.

Keeping this free.

OriginsAPI is designed to stay free for anyone who wants raw access to public domain texts. The infrastructure is built to make that possible without the costs that typically force APIs to add gates, rate limits, and paywalls.

But free infrastructure does not mean zero cost. Verification takes time. Expanding to new categories takes time. Maintaining data quality across hundreds of thousands of records takes time. If OriginsAPI has been useful to you and you want to help it grow, a contribution goes directly toward that work.