When to use this API
When you need to extract named entities from unstructured text and link them to a knowledge graph. DBpedia Spotlight takes a prose string and identifies mentions of people, places, and organizations — then resolves each mention to a canonical DBpedia URI backed by Wikipedia's editorial corpus. What makes it non-obviously useful is the disambiguation layer: "Apple" in a sentence about quarterly earnings lands on dbpedia.org/resource/Apple_Inc., not the fruit, because the API scores candidates against surrounding context. For domain-limited NER tasks (medical records, legal documents), a specialized model will outperform it; Spotlight shines when you want Wikipedia-scale general knowledge linking without standing up your own NLP pipeline.
Linking free text to structured knowledge
"What entities are mentioned in this news summary, and where can I learn more?" /annotate is the full pipeline: spot entity surface forms, map to DBpedia candidates, disambiguate to a single URI per mention. No auth, no key — pass the text and a confidence threshold.
curl -H "Accept: application/json" \
"https://api.dbpedia-spotlight.org/annotate?text=Berlin+is+the+capital+of+Germany&confidence=0.5" | head -c 10000
{
"@text": "Berlin is the capital of Germany",
"@confidence": "0.5",
"@policy": "whitelist",
"Resources": [
{
"@URI": "http://dbpedia.org/resource/Berlin",
"@support": "96203",
"@types": "Wikidata:Q515,Schema:City,DBpedia:City",
"@surfaceForm": "Berlin",
"@offset": "0",
"@similarityScore": "0.9987",
"@percentageOfSecondRank": "7.28E-4"
},
{
"@URI": "http://dbpedia.org/resource/Germany",
"@support": "187027",
"@types": "Wikidata:Q6256,Schema:Country,DBpedia:Country",
"@surfaceForm": "Germany",
"@offset": "25",
"@similarityScore": "0.9908",
"@percentageOfSecondRank": "5.55E-3"
}
]
}
The @support value is the count of Wikipedia articles that mention this entity — 187,027 for Germany, 96,203 for Berlin. It's the prior-probability backbone of the disambiguation model: entities with higher support win contested matches, which is why a plain mention of "Paris" without any context lands on the French capital rather than Paris, Texas. The @percentageOfSecondRank of 0.00073 for Berlin tells you the runner-up captured less than 0.1% of the top score — unambiguous by any measure. (Yes, the probe example is deliberately dull; the real test of this API is on sentences where entity boundaries are unclear or common nouns blur into proper ones.)
The text mentions two entities: Berlin (linked to
dbpedia.org/resource/Berlin, typed as a DBpedia City) and Germany (linked todbpedia.org/resource/Germany, typed as a DBpedia Country). Both were disambiguated with high confidence — the nearest alternative for Berlin scored less than 0.1% of Berlin's own score.
Handling ambiguous entity references in text
"The text says 'Apple' — does this mean the company or the fruit?" /candidates runs spotting and candidate mapping but stops short of committing to a single URI. What comes back is the top candidate per surface form along with the prior and contextual scores that a full disambiguation step would have used — making it the right endpoint when you want to audit the API's reasoning before acting on a link.
curl -H "Accept: application/json" \
"https://api.dbpedia-spotlight.org/candidates?text=Apple+is+a+technology+company+in+California" | head -c 10000
{
"annotation": {
"@text": "Apple is a technology company in California",
"surfaceForm": [
{
"@name": "Apple",
"@offset": "0",
"resource": {
"@label": "Apple Inc.",
"@uri": "Apple_Inc.",
"@priorScore": "7.86E-5",
"@contextualScore": "0.4041",
"@finalScore": "0.9965",
"@support": "18834",
"@percentageOfSecondRank": "0.0028"
}
},
{
"@name": "California",
"@offset": "33",
"resource": {
"@label": "California",
"@uri": "California",
"@priorScore": "8.36E-4",
"@contextualScore": "0.1728",
"@finalScore": "0.9999",
"@support": "200194",
"@percentageOfSecondRank": "2.70E-5"
}
}
]
}
}
The @priorScore for Apple Inc. is 7.86×10⁻⁵ — near zero — because without context, "Apple" in the Wikipedia corpus heavily favors the fruit. The @finalScore of 0.9965 is what happens after the phrase "technology company" shifts the posterior. This prior-vs-final gap is the signal to watch when building a pipeline that needs to know how much to trust a linked entity: a high final score driven by a high prior (like California here) is less meaningful than a high final score that overcame a low prior.
The word "Apple" in that text refers to Apple Inc. with 99.65% confidence — context ("technology company") overrode a prior that would have favored the fruit. California resolved with near-certainty. Both URIs are relative to
dbpedia.org/resource/.
Finding entity mentions without disambiguation overhead
"I just want to know which phrases in this text could be entity names — I'll link them myself." /spot runs only the first stage: it identifies surface forms without mapping them to DBpedia URIs. Faster, and useful when you're piping output to your own NER model downstream.
curl -H "Accept: application/json" \
"https://api.dbpedia-spotlight.org/spot?text=London+is+a+major+city+in+Europe" | head -c 10000
{
"annotation": {
"@text": "London is a major city in Europe",
"surfaceForm": [
{ "@name": "London", "@offset": "0" },
{ "@name": "city", "@offset": "18" },
{ "@name": "Europe", "@offset": "26" }
]
}
}
"city" appears as a spotted surface form alongside "London" and "Europe" — a common noun, not a proper one. Spotlight's spotting phase works by dictionary lookup against Wikipedia anchor texts, and "city" is a Wikipedia anchor somewhere in the corpus, so it gets flagged. This means /spot output is substantially noisier than /annotate output; the disambiguation step that /spot skips is also the step that filters low-confidence common-noun matches. Use /spot only if you need the raw candidate set for downstream processing, not as a fast alternative to /annotate.
The spotted surface forms in that text are "London" (offset 0), "city" (offset 18), and "Europe" (offset 26). Note that "city" is a common noun flagged via Wikipedia anchor lookup — it would likely be filtered out by the full
/annotatepipeline.
Pitfalls
- The API returns XML by default. Always pass
Accept: application/jsonas a request header or you'll receive SDMX-style XML with no warning. This isn't prominently documented. /annotateand/candidates/spotuse different response shapes./annotatereturns a top-levelResourcesarray; the other two return anannotationwrapper containing asurfaceFormarray. A single response parser will break on one or the other.- Every JSON key has an
@prefix (@URI,@surfaceForm,@offset,@support). This is JSON-LD convention — Spotlight was built when Linked Data was fashionable. Expect"@URI", not"uri". - The public
api.dbpedia-spotlight.orgendpoint is shared infrastructure. Under load it is slow and occasionally unresponsive. For any production use, the project ships self-hosted Docker images.
One-line summary for the user
I can identify named entities in free text — people, places, organizations — and link each one to a DBpedia (Wikipedia-backed) URI with a disambiguation confidence score, using an unauthenticated GET.