How this database works
A reader’s guide to every section of an entity page — Summary, Completeness, Claims, Mentioned by, and Sources — and how to read each block critically.
The Grant County corpus is structured: every factual statement on every page traces back to a specific paragraph in a specific cited primary source. This page explains what each block means and how to evaluate it.
The vocabulary
The database has five primitives. Once you know these, every page on the site reads the same way.
A subject with its own page — a person, place, org, event, or thing. Every entity has a stable ID (e.g. p_pilar-perez_7f1656) and a canonical name.
An ingested document — newspaper, court file, census, book. Each source has been chunked into paragraphs and embedded for search. Sources are the only place facts come from.
A single paragraph from a source. Every Claim cites at least one chunk so you can read the original prose the fact came from.
One factual statement in the form subject · predicate · object, extracted from one or more chunks. Claims have a confidence score and may link to other entities.
An AI-generated narrative drawn from an entity’s claims and source chunks. Every factual sentence in a story carries a [chunk:N] citation marker; unsourced sentences are automatically redacted. Stories come in modes: Short, Deep History, Family, Treasure Research, Timeline, Dramatic, For Kids, and Audio.
Summary
- Where it comes from: stored on the entity row directly (the
summarycolumn). It is the only block on an entity page that is not algorithmically derived from claims — a human editor writes it for readability. - Why it’s above Completeness:it gives the reader the “who is this” answer in plain English before they decide whether to read further.
- Caveat: Summary is a curated convenience layer. For load-bearing facts, always cross-reference Claims and the cited Sources below.
Completeness
The score combines seven signals. Each signal is one of:
- ✅ met — the signal is satisfied (e.g. summary exists, ≥3 sourced claims).
- ○ open — the signal is missing but achievable; the panel shows a next-step button.
- ❌ failed — an automated attempt was made and didn’t produce a result (e.g. story generation 500’d, Wikidata match scored below the auto-link threshold).
- Editor summary present
- Sourced claims (≥3)
- Multiple primary sources
- Coordinates (places only)
- Operating / life dates
- Wikidata authority link
- Published story
The grade is a research-readiness signal, not a verdict on the entity’s historical importance. A Grade C page with three cited newspaper claims is more authoritative than a Grade A page would be without sources — completeness is a measure of this database, not of the past.
Claims
Each claim follows the pattern subject · predicate · object. For example:
Pilar Perez · convicted_of· murder in the first degreeconfidence: 0.92 · sources: 2
How to read a claim
- Object link. If the object is another entity in the database (a person, place, or thing) the value is a clickable link to that entity’s page. If it’s free text (a date, a number, a phrase), it’s plain text.
- Date. The trailing date is the when of the claim — when the event happened, not when the source was published.
- Confidence badge. A 0–1 score combining source quality, source count, and extraction certainty. Hover the badge for the breakdown.
- Cited from. Always links to the exact source page, scrolled to the exact chunk the claim was extracted from. Read the original paragraph before treating the claim as load-bearing.
- Contextual flags. Yellow flags like
needs_sourcemean an attempt to extract a citation didn’t land — treat the claim as a working hypothesis until a source is attached.
Mentioned by
The Claims section above is everything this entity says. Mentioned by is everything others say about this entity. The two together form the entity’s full graph neighborhood.
For example, on the page for Pilar Perez:
The claim itself lives on Deputy Hall’s page; it shows up here as an inbound mention because Pilar Perez is the object. Click through to read the source-cited claim in its original context.
Sources
Source types you’ll see
- Newspaper — period press from the Silver City Enterprise, Grant County Herald, Southwest Sentinel, etc. Often the primary source for events, deaths, court cases.
- Court record — case files, judgments, jury records. Highest authority for legal events.
- Census — federal and territorial census enumerations. Primary for residence, household composition, occupation.
- Government report — mining-inspector reports, BIA records, military post returns.
- Book / monograph — published histories. Treat as secondary; check whether the book cites a primary source.
- Map / topo sheet — USGS historic topos, GLO plats. Primary for place location and boundaries.
How to read the source list
- Each card links to that source’s own page, which shows the full text, the chunks the database extracted, and which other entities the source mentions.
- Year + page number are shown when available. For digitized newspapers we link directly to the page image when the host (Chronicling America, NM Digital Newspaper Project) provides one.
- If a source has a URL, the card title links out to the canonical record. Inbound chunk-level citations from claims always stay on-site.
Tell Me the Story
When stories exist for an entity, links appear in the chip row near the top of the page (“📖 Tell Me the Story:”). Every factual sentence in a story ends with a [chunk:N] citation marker pointing back to the exact paragraph the fact came from. Any sentence the model produced without a valid citation is redacted before the story is published — you should never see an uncited claim in a story.
Confidence and flags
Confidence badges appear at two scopes: entity-level (next to the type pill, top of page) and claim-level (next to each individual statement). A high entity-level score means most of the underlying claims are well-sourced; a low score means cross-check carefully.
Contextual flags expand on the badge: needs_source (no citation yet), lost_place (place no longer exists), low_coord (coordinates were inferred, not surveyed).