@happyvertical/smrt-facts
Distributed knowledge base with 3-zone semantic deduplication, evolution chains, provenance tracking, and confidence scoring.
Overview
smrt-facts provides a distributed knowledge base where facts are atomic units of knowledge with provenance tracking. Facts evolve through parent-child chains, undergo 3-zone semantic reconciliation (0.85 / 0.60 thresholds) to prevent duplicates, and carry confidence scores computed from source credibility, recency, and corroboration.
Tenancy
Fact uses @TenantScoped({ mode: 'optional' }) with a
nullable tenantId. Global facts (tenantId = null) are visible to
every tenant β useful for shared reference knowledge. Use findWithGlobals(tenantId) to retrieve tenant + global facts in one query.
Installation
npm install @happyvertical/smrt-factsQuick Start
import {
Fact, FactCollection,
FactSource, FactSourceCollection,
FactSubject, FactSubjectCollection,
} from '@happyvertical/smrt-facts';
// Create a fact with provenance
const facts = await FactCollection.create({ db });
const fact = await facts.create({
textRefined: 'The Eiffel Tower is 330 meters tall',
type: 'measurement',
domain: 'landmarks',
status: 'active',
});
// Attach a source with credibility score
const sources = await FactSourceCollection.create({ db });
await sources.create({
factId: fact.id,
sourceUrl: 'https://example.com/eiffel-tower',
sourceTitle: 'Tourism Board',
credibility: 0.9,
});
// Recalculate confidence from all sources
await facts.recalculateConfidence(fact.id);
// 3-zone semantic reconciliation
const result = await facts.reconcile({
rawInput: 'The Eiffel Tower stands 330m tall',
type: 'measurement',
domain: 'landmarks',
source: { sourceUrl: 'https://another-source.com', credibility: 0.8 },
});
// result.action: 'created' | 'merged' | 'branched'
// Evolution: branch creates a successor linked via previousFactId
const child = await facts.branch(fact.id, {
textRefined: 'The Eiffel Tower is 330 meters tall including the antenna',
}, 'correction');
// Walk evolution
const chain = await facts.getEvolutionChain(child.id); // root β current
const latest = await facts.getLatestInChain(fact.id); // highest-confidence leaf
const tree = await facts.getEvolutionTree(fact.id); // BFS all descendants
// Entity briefing: all facts for a given entity
const briefing = await facts.getEntityBriefing('Place', placeId);Core Models
Fact
class Fact extends SmrtObject {
textRefined: string // Cleaned knowledge statement
textRaw: string // Original, unprocessed input
type: string // assertion / observation / measurement / definition / relationship / event / opinion / prediction
domain: string
status: string // pending / active / disputed / superseded / archived / retracted
confidence: number // 0-1, computed from sources
sourceCount: number // number of attached sources
previousFactId: string // @foreignKey('Fact') β evolution chain link (NOT a structural parent)
evolutionType: string // original / correction / refinement / contradiction / extension / merge
// Auto-generated embeddings (@smrt embeddings config):
// fields: ['textRefined'], provider: 'auto', autoGenerate: true,
// combinedField: { name: 'full_context', template: '{textRefined}\n\nType: {type}\nDomain: {domain}' }
}FactSource (Provenance)
class FactSource extends SmrtObject {
factId: string
sourceUrl: string
sourceTitle?: string
sourceType?: string
credibility: number // 0-1
extractedAt?: Date
}FactSubject (Polymorphic Entity Link)
class FactSubject extends SmrtObject {
factId: string // @foreignKey('Fact')
entityType: string // e.g. 'Place', 'Person'
entityId: string // Plain string ID -- NO FK (cross-package)
role: string // SubjectRole: subject / object / source / location / participant / related (default 'subject')
// conflictColumns: ['fact_id', 'entity_type', 'entity_id']
}FactContent (Content Junction)
class FactContent extends SmrtObject {
factId: string // @foreignKey('Fact')
contentId: string // Plain string ID to smrt-content -- NO FK (cross-package)
relationship: string // FactContentRelationship: extracted_from / referenced_in / supports / contradicts / related (default 'extracted_from')
// conflictColumns: ['fact_id', 'content_id', 'relationship']
}FactEvidence (provenance spans)
FactSource is the coarse source summary; FactEvidence records the
concrete excerpt/span/artifact that supports (or contradicts) a fact. Each evidence row carries
a verdict status and the quote + locator it was drawn from, so a claim's support can
be traced back to specific text.
// @TenantScoped({ mode: 'optional' });
// conflictColumns: ['fact_id', 'evidence_key']
class FactEvidence extends SmrtObject {
factId: string // @foreignKey('Fact', { required: true })
evidenceKey: string // stable per-fact dedup key (required)
status: FactEvidenceStatus // 'supports' | 'contradicts' | 'unclear' | 'irrelevant' | 'invalid'
sourceKind: string // e.g. 'article' | 'transcript' | 'dataset'
sourceId: string // plain string ref (cross-package)
sourceUrl: string
sourceTitle: string
quote: string // the supporting excerpt
locator: string // page / timestamp / selector
extractionMethod: string // how the span was obtained
confidence: number // 0-1
}Claim extraction pipeline
FactCollection exposes an AI-assisted pipeline for turning unstructured text into
reviewable fact candidates and for checking whether a claim is actually supported. These methods
call the configured AI client's message() with a fact-extraction prompt (resolved
through smrt-prompts, so tenants can override it) β they are not persistent: callers review, reconcile, link, or discard the returned
candidates per their own workflow.
Two extractors
The two extractors are intentionally separate: extractCandidatesFromText finds
evidence-backed facts in a source, while extractArticleClaims finds the
material claims an article draft itself needs to justify.
// 1) Extract atomic facts from source material (agenda, minutes, transcriptβ¦)
const candidates = await facts.extractCandidatesFromText(sourceText, {
domain: 'civic',
sourceType: 'minutes',
maxFacts: 12, // default 12
// allowedTypes: ['assertion', 'measurement', ...]
});
// 2) Extract the claims an article makes (what the draft must justify)
const claims = await facts.extractArticleClaims(articleBody, {
domain: 'civic',
maxFacts: 24, // default 24
});
// FactExtractionCandidate:
// { statement, type?, sourceExcerpt?, confidence?, metadata? }Assessing claim support
Given a claim and candidate facts (each optionally carrying FactEvidence), assessClaimSupport classifies whether the claim holds. It returns a status plus the
fact/evidence ids that matched and a rationale β the audit trail for a fact-check.
const assessment = await facts.assessClaimSupport(claim, candidateFacts);
// FactClaimSupportAssessment:
// {
// status: 'supported' | 'unsupported' | 'contradicted' | 'needs_review',
// matchedFactIds: string[],
// matchedEvidenceIds: string[],
// rationale: string,
// confidence?: number,
// }3-Zone Semantic Reconciliation
// Similarity zones:
//
// >= 0.85 Auto-merge (same fact, update metadata)
// 0.60-0.85 AI disambiguation (model decides merge / branch / create)
// < 0.60 New fact (no match)
//
// If AI disambiguation fails, defaults to BRANCH (safer than merge).
const result = await facts.reconcile({
rawInput: 'New fact text to reconcile',
type: 'assertion',
domain: 'science',
source: { sourceUrl: 'https://source.com', credibility: 0.85 },
});
// Confidence formula (clamped 0-1):
// base 0.5
// + source volume (max 0.3)
// + avg credibility (max 0.2)
// + recency (max 0.1, decays over 10 days)
// + corroboration (max 0.1)Evolution Chains
All evolution traversals use a visited Set for cycle detection β circular chains won't
blow the stack.
await facts.getEvolutionChain(factId); // root β current (linear ancestry)
await facts.getLatestInChain(rootId); // highest-confidence leaf
await facts.getEvolutionTree(rootId); // BFS all descendantsGotchas
- Embedding failures are non-fatal: try/catch with silent fail β doesn't block fact creation
- Metadata auto-stringify: constructor
JSON.stringifys objects; getters return parsed objects - AI disambiguation fallback: if the model fails, the resolver defaults to
branch(safer thanmerge) - Optional tenancy with nullable
tenantIdβ usefindWithGlobals(tenantId)
Best Practices
DOs
- Use
reconcile()to prevent duplicate facts on ingest - Attach sources with credibility scores so confidence stays meaningful
- Use evolution chains for corrections and refinements (never silently overwrite)
- Call
recalculateConfidence()after adding new sources - Use
findWithGlobals(tenantId)to include global facts
DON'Ts
- Don't skip reconciliation when ingesting facts β creates duplicates
- Don't manually set
confidenceβ userecalculateConfidence() - Don't modify metadata fields directly β use the getter/setter helpers
- Don't try to model circular evolution chains intentionally β traversals will short-circuit