@happyvertical/smrt-facts

Distributed knowledge base with 3-zone semantic deduplication, evolution chains, provenance tracking, and confidence scoring.

v0.29.34Semantic DedupEvolution ChainsConfidence

Overview

smrt-facts provides a distributed knowledge base where facts are atomic units of knowledge with provenance tracking. Facts evolve through parent-child chains, undergo 3-zone semantic reconciliation (0.85 / 0.60 thresholds) to prevent duplicates, and carry confidence scores computed from source credibility, recency, and corroboration.

Tenancy

Fact uses @TenantScoped({ mode: 'optional' }) with a nullable tenantId. Global facts (tenantId = null) are visible to every tenant β€” useful for shared reference knowledge. Use findWithGlobals(tenantId) to retrieve tenant + global facts in one query.

Installation

bash
npm install @happyvertical/smrt-facts

Quick Start

typescript
import {
  Fact, FactCollection,
  FactSource, FactSourceCollection,
  FactSubject, FactSubjectCollection,
} from '@happyvertical/smrt-facts';

// Create a fact with provenance
const facts = await FactCollection.create({ db });
const fact = await facts.create({
  textRefined: 'The Eiffel Tower is 330 meters tall',
  type: 'measurement',
  domain: 'landmarks',
  status: 'active',
});

// Attach a source with credibility score
const sources = await FactSourceCollection.create({ db });
await sources.create({
  factId: fact.id,
  sourceUrl: 'https://example.com/eiffel-tower',
  sourceTitle: 'Tourism Board',
  credibility: 0.9,
});

// Recalculate confidence from all sources
await facts.recalculateConfidence(fact.id);

// 3-zone semantic reconciliation
const result = await facts.reconcile({
  rawInput: 'The Eiffel Tower stands 330m tall',
  type: 'measurement',
  domain: 'landmarks',
  source: { sourceUrl: 'https://another-source.com', credibility: 0.8 },
});
// result.action: 'created' | 'merged' | 'branched'

// Evolution: branch creates a successor linked via previousFactId
const child = await facts.branch(fact.id, {
  textRefined: 'The Eiffel Tower is 330 meters tall including the antenna',
}, 'correction');

// Walk evolution
const chain = await facts.getEvolutionChain(child.id); // root β†’ current
const latest = await facts.getLatestInChain(fact.id);  // highest-confidence leaf
const tree = await facts.getEvolutionTree(fact.id);    // BFS all descendants

// Entity briefing: all facts for a given entity
const briefing = await facts.getEntityBriefing('Place', placeId);

Core Models

Fact

typescript
class Fact extends SmrtObject {
  textRefined: string         // Cleaned knowledge statement
  textRaw: string             // Original, unprocessed input
  type: string                // assertion / observation / measurement / definition / relationship / event / opinion / prediction
  domain: string
  status: string              // pending / active / disputed / superseded / archived / retracted
  confidence: number          // 0-1, computed from sources
  sourceCount: number         // number of attached sources
  previousFactId: string      // @foreignKey('Fact') β€” evolution chain link (NOT a structural parent)
  evolutionType: string       // original / correction / refinement / contradiction / extension / merge

  // Auto-generated embeddings (@smrt embeddings config):
  //   fields: ['textRefined'], provider: 'auto', autoGenerate: true,
  //   combinedField: { name: 'full_context', template: '{textRefined}\n\nType: {type}\nDomain: {domain}' }
}

FactSource (Provenance)

typescript
class FactSource extends SmrtObject {
  factId: string
  sourceUrl: string
  sourceTitle?: string
  sourceType?: string
  credibility: number         // 0-1
  extractedAt?: Date
}

FactSubject (Polymorphic Entity Link)

typescript
class FactSubject extends SmrtObject {
  factId: string              // @foreignKey('Fact')
  entityType: string          // e.g. 'Place', 'Person'
  entityId: string            // Plain string ID -- NO FK (cross-package)
  role: string                // SubjectRole: subject / object / source / location / participant / related (default 'subject')

  // conflictColumns: ['fact_id', 'entity_type', 'entity_id']
}

FactContent (Content Junction)

typescript
class FactContent extends SmrtObject {
  factId: string              // @foreignKey('Fact')
  contentId: string           // Plain string ID to smrt-content -- NO FK (cross-package)
  relationship: string        // FactContentRelationship: extracted_from / referenced_in / supports / contradicts / related (default 'extracted_from')

  // conflictColumns: ['fact_id', 'content_id', 'relationship']
}

FactEvidence (provenance spans)

FactSource is the coarse source summary; FactEvidence records the concrete excerpt/span/artifact that supports (or contradicts) a fact. Each evidence row carries a verdict status and the quote + locator it was drawn from, so a claim's support can be traced back to specific text.

typescript
// @TenantScoped({ mode: 'optional' });
// conflictColumns: ['fact_id', 'evidence_key']
class FactEvidence extends SmrtObject {
  factId: string              // @foreignKey('Fact', { required: true })
  evidenceKey: string         // stable per-fact dedup key (required)
  status: FactEvidenceStatus  // 'supports' | 'contradicts' | 'unclear' | 'irrelevant' | 'invalid'
  sourceKind: string          // e.g. 'article' | 'transcript' | 'dataset'
  sourceId: string            // plain string ref (cross-package)
  sourceUrl: string
  sourceTitle: string
  quote: string               // the supporting excerpt
  locator: string             // page / timestamp / selector
  extractionMethod: string    // how the span was obtained
  confidence: number          // 0-1
}

Claim extraction pipeline

FactCollection exposes an AI-assisted pipeline for turning unstructured text into reviewable fact candidates and for checking whether a claim is actually supported. These methods call the configured AI client's message() with a fact-extraction prompt (resolved through smrt-prompts, so tenants can override it) β€” they are not persistent: callers review, reconcile, link, or discard the returned candidates per their own workflow.

Two extractors

The two extractors are intentionally separate: extractCandidatesFromText finds evidence-backed facts in a source, while extractArticleClaims finds the material claims an article draft itself needs to justify.

typescript
// 1) Extract atomic facts from source material (agenda, minutes, transcript…)
const candidates = await facts.extractCandidatesFromText(sourceText, {
  domain: 'civic',
  sourceType: 'minutes',
  maxFacts: 12,                 // default 12
  // allowedTypes: ['assertion', 'measurement', ...]
});

// 2) Extract the claims an article makes (what the draft must justify)
const claims = await facts.extractArticleClaims(articleBody, {
  domain: 'civic',
  maxFacts: 24,                 // default 24
});

// FactExtractionCandidate:
// { statement, type?, sourceExcerpt?, confidence?, metadata? }

Assessing claim support

Given a claim and candidate facts (each optionally carrying FactEvidence), assessClaimSupport classifies whether the claim holds. It returns a status plus the fact/evidence ids that matched and a rationale β€” the audit trail for a fact-check.

typescript
const assessment = await facts.assessClaimSupport(claim, candidateFacts);

// FactClaimSupportAssessment:
// {
//   status: 'supported' | 'unsupported' | 'contradicted' | 'needs_review',
//   matchedFactIds: string[],
//   matchedEvidenceIds: string[],
//   rationale: string,
//   confidence?: number,
// }

3-Zone Semantic Reconciliation

typescript
// Similarity zones:
//
//   >= 0.85   Auto-merge (same fact, update metadata)
//   0.60-0.85 AI disambiguation (model decides merge / branch / create)
//   <  0.60   New fact (no match)
//
// If AI disambiguation fails, defaults to BRANCH (safer than merge).

const result = await facts.reconcile({
  rawInput: 'New fact text to reconcile',
  type: 'assertion',
  domain: 'science',
  source: { sourceUrl: 'https://source.com', credibility: 0.85 },
});

// Confidence formula (clamped 0-1):
//   base 0.5
//   + source volume (max 0.3)
//   + avg credibility (max 0.2)
//   + recency (max 0.1, decays over 10 days)
//   + corroboration (max 0.1)

Evolution Chains

All evolution traversals use a visited Set for cycle detection β€” circular chains won't blow the stack.

typescript
await facts.getEvolutionChain(factId);  // root β†’ current (linear ancestry)
await facts.getLatestInChain(rootId);   // highest-confidence leaf
await facts.getEvolutionTree(rootId);   // BFS all descendants

Gotchas

  • Embedding failures are non-fatal: try/catch with silent fail β€” doesn't block fact creation
  • Metadata auto-stringify: constructor JSON.stringifys objects; getters return parsed objects
  • AI disambiguation fallback: if the model fails, the resolver defaults to branch (safer than merge)
  • Optional tenancy with nullable tenantId β€” use findWithGlobals(tenantId)

Best Practices

DOs

  • Use reconcile() to prevent duplicate facts on ingest
  • Attach sources with credibility scores so confidence stays meaningful
  • Use evolution chains for corrections and refinements (never silently overwrite)
  • Call recalculateConfidence() after adding new sources
  • Use findWithGlobals(tenantId) to include global facts

DON'Ts

  • Don't skip reconciliation when ingesting facts β€” creates duplicates
  • Don't manually set confidence β€” use recalculateConfidence()
  • Don't modify metadata fields directly β€” use the getter/setter helpers
  • Don't try to model circular evolution chains intentionally β€” traversals will short-circuit

Related Modules