@happyvertical/smrt-facts

Distributed knowledge base with 3-zone semantic deduplication, evolution chains, provenance tracking, and confidence scoring.

v0.29.34Semantic DedupEvolution ChainsConfidence

Overview

smrt-facts provides a distributed knowledge base where facts are atomic units of knowledge with provenance tracking. Facts evolve through parent-child chains, undergo 3-zone semantic reconciliation (0.85 / 0.60 thresholds) to prevent duplicates, and carry confidence scores computed from source credibility, recency, and corroboration.

Tenancy

Fact uses @TenantScoped({ mode: 'optional' }) with a nullable tenantId. Global facts (tenantId = null) are visible to every tenant — useful for shared reference knowledge. Use findWithGlobals(tenantId) to retrieve tenant + global facts in one query.

Installation

bash

npm install @happyvertical/smrt-facts

Quick Start

typescript

import {
  Fact, FactCollection,
  FactSource, FactSourceCollection,
  FactSubject, FactSubjectCollection,
} from '@happyvertical/smrt-facts';

// Create a fact with provenance
const facts = await FactCollection.create({ db });
const fact = await facts.create({
  textRefined: 'The Eiffel Tower is 330 meters tall',
  type: 'measurement',
  domain: 'landmarks',
  status: 'active',
});

// Attach a source with credibility score
const sources = await FactSourceCollection.create({ db });
await sources.create({
  factId: fact.id,
  sourceUrl: 'https://example.com/eiffel-tower',
  sourceTitle: 'Tourism Board',
  credibility: 0.9,
});

// Recalculate confidence from all sources
await facts.recalculateConfidence(fact.id);

// 3-zone semantic reconciliation
const result = await facts.reconcile({
  rawInput: 'The Eiffel Tower stands 330m tall',
  type: 'measurement',
  domain: 'landmarks',
  source: { sourceUrl: 'https://another-source.com', credibility: 0.8 },
});
// result.action: 'created' | 'merged' | 'branched'

// Evolution: branch creates a successor linked via previousFactId
const child = await facts.branch(fact.id, {
  textRefined: 'The Eiffel Tower is 330 meters tall including the antenna',
}, 'correction');

// Walk evolution
const chain = await facts.getEvolutionChain(child.id); // root → current
const latest = await facts.getLatestInChain(fact.id);  // highest-confidence leaf
const tree = await facts.getEvolutionTree(fact.id);    // BFS all descendants

// Entity briefing: all facts for a given entity
const briefing = await facts.getEntityBriefing('Place', placeId);

Core Models

Fact

typescript

class Fact extends SmrtObject {
  textRefined: string         // Cleaned knowledge statement
  textRaw: string             // Original, unprocessed input
  type: string                // assertion / observation / measurement / definition / relationship / event / opinion / prediction
  domain: string
  status: string              // pending / active / disputed / superseded / archived / retracted
  confidence: number          // 0-1, computed from sources
  sourceCount: number         // number of attached sources
  previousFactId: string      // @foreignKey('Fact') — evolution chain link (NOT a structural parent)
  evolutionType: string       // original / correction / refinement / contradiction / extension / merge

  // Auto-generated embeddings (@smrt embeddings config):
  //   fields: ['textRefined'], provider: 'auto', autoGenerate: true,
  //   combinedField: { name: 'full_context', template: '{textRefined}\n\nType: {type}\nDomain: {domain}' }
}

FactSource (Provenance)

typescript

class FactSource extends SmrtObject {
  factId: string
  sourceUrl: string
  sourceTitle?: string
  sourceType?: string
  credibility: number         // 0-1
  extractedAt?: Date
}

FactSubject (Polymorphic Entity Link)

typescript

class FactSubject extends SmrtObject {
  factId: string              // @foreignKey('Fact')
  entityType: string          // e.g. 'Place', 'Person'
  entityId: string            // Plain string ID -- NO FK (cross-package)
  role: string                // SubjectRole: subject / object / source / location / participant / related (default 'subject')

  // conflictColumns: ['fact_id', 'entity_type', 'entity_id']
}

FactContent (Content Junction)

typescript

class FactContent extends SmrtObject {
  factId: string              // @foreignKey('Fact')
  contentId: string           // Plain string ID to smrt-content -- NO FK (cross-package)
  relationship: string        // FactContentRelationship: extracted_from / referenced_in / supports / contradicts / related (default 'extracted_from')

  // conflictColumns: ['fact_id', 'content_id', 'relationship']
}

FactEvidence (provenance spans)

FactSource is the coarse source summary; FactEvidence records the concrete excerpt/span/artifact that supports (or contradicts) a fact. Each evidence row carries a verdict status and the quote + locator it was drawn from, so a claim's support can be traced back to specific text.

typescript

// @TenantScoped({ mode: 'optional' });
// conflictColumns: ['fact_id', 'evidence_key']
class FactEvidence extends SmrtObject {
  factId: string              // @foreignKey('Fact', { required: true })
  evidenceKey: string         // stable per-fact dedup key (required)
  status: FactEvidenceStatus  // 'supports' | 'contradicts' | 'unclear' | 'irrelevant' | 'invalid'
  sourceKind: string          // e.g. 'article' | 'transcript' | 'dataset'
  sourceId: string            // plain string ref (cross-package)
  sourceUrl: string
  sourceTitle: string
  quote: string               // the supporting excerpt
  locator: string             // page / timestamp / selector
  extractionMethod: string    // how the span was obtained
  confidence: number          // 0-1
}

Claim extraction pipeline

FactCollection exposes an AI-assisted pipeline for turning unstructured text into reviewable fact candidates and for checking whether a claim is actually supported. These methods call the configured AI client's message() with a fact-extraction prompt (resolved through smrt-prompts, so tenants can override it) — they are not persistent: callers review, reconcile, link, or discard the returned candidates per their own workflow.

Two extractors

The two extractors are intentionally separate: extractCandidatesFromText finds evidence-backed facts in a source, while extractArticleClaims finds the material claims an article draft itself needs to justify.

typescript

// 1) Extract atomic facts from source material (agenda, minutes, transcript…)
const candidates = await facts.extractCandidatesFromText(sourceText, {
  domain: 'civic',
  sourceType: 'minutes',
  maxFacts: 12,                 // default 12
  // allowedTypes: ['assertion', 'measurement', ...]
});

// 2) Extract the claims an article makes (what the draft must justify)
const claims = await facts.extractArticleClaims(articleBody, {
  domain: 'civic',
  maxFacts: 24,                 // default 24
});

// FactExtractionCandidate:
// { statement, type?, sourceExcerpt?, confidence?, metadata? }

Assessing claim support

Given a claim and candidate facts (each optionally carrying FactEvidence), assessClaimSupport classifies whether the claim holds. It returns a status plus the fact/evidence ids that matched and a rationale — the audit trail for a fact-check.

typescript

const assessment = await facts.assessClaimSupport(claim, candidateFacts);

// FactClaimSupportAssessment:
// {
//   status: 'supported' | 'unsupported' | 'contradicted' | 'needs_review',
//   matchedFactIds: string[],
//   matchedEvidenceIds: string[],
//   rationale: string,
//   confidence?: number,
// }

3-Zone Semantic Reconciliation

typescript

// Similarity zones:
//
//   >= 0.85   Auto-merge (same fact, update metadata)
//   0.60-0.85 AI disambiguation (model decides merge / branch / create)
//   <  0.60   New fact (no match)
//
// If AI disambiguation fails, defaults to BRANCH (safer than merge).

const result = await facts.reconcile({
  rawInput: 'New fact text to reconcile',
  type: 'assertion',
  domain: 'science',
  source: { sourceUrl: 'https://source.com', credibility: 0.85 },
});

// Confidence formula (clamped 0-1):
//   base 0.5
//   + source volume (max 0.3)
//   + avg credibility (max 0.2)
//   + recency (max 0.1, decays over 10 days)
//   + corroboration (max 0.1)

Evolution Chains

All evolution traversals use a visited Set for cycle detection — circular chains won't blow the stack.

typescript

await facts.getEvolutionChain(factId);  // root → current (linear ancestry)
await facts.getLatestInChain(rootId);   // highest-confidence leaf
await facts.getEvolutionTree(rootId);   // BFS all descendants

Gotchas

Embedding failures are non-fatal: try/catch with silent fail — doesn't block fact creation
Metadata auto-stringify: constructor JSON.stringifys objects; getters return parsed objects
AI disambiguation fallback: if the model fails, the resolver defaults to branch (safer than merge)
Optional tenancy with nullable tenantId — use findWithGlobals(tenantId)

Best Practices

DOs

Use reconcile() to prevent duplicate facts on ingest
Attach sources with credibility scores so confidence stays meaningful
Use evolution chains for corrections and refinements (never silently overwrite)
Call recalculateConfidence() after adding new sources
Use findWithGlobals(tenantId) to include global facts

DON'Ts

Don't skip reconciliation when ingesting facts — creates duplicates
Don't manually set confidence — use recalculateConfidence()
Don't modify metadata fields directly — use the getter/setter helpers
Don't try to model circular evolution chains intentionally — traversals will short-circuit

@happyvertical/smrt-facts

Overview

Tenancy

Installation

Quick Start

Core Models

Fact

FactSource (Provenance)

FactSubject (Polymorphic Entity Link)

FactContent (Content Junction)

FactEvidence (provenance spans)

Claim extraction pipeline

Two extractors

Assessing claim support

3-Zone Semantic Reconciliation

Evolution Chains

Gotchas

Best Practices

DOs

DON'Ts

Related Modules