@happyvertical/smrt-video

AI video production: Character + Performer + Scene, ComfyUI workflow integration, frame-based durations, and dedicated noun joins for owned assets.

v0.29.34Video PipelineComfyUIFrame-Based

Overview

smrt-video models the AI video production pipeline. Characters define virtual personas with voice and branding, Performers carry physical likeness via IP-Adapter FaceID, Scenes provide virtual backgrounds, and the Composition → Sequence → Shot hierarchy organises generated content. ComfyUI workflows are stored as templates with semantic parameter injection.

Installation

bash
npm install @happyvertical/smrt-video

Quick Start

typescript
import {
  Character, Performer, Scene,
  VideoShot, VideoSequence, VideoComposition,
  VideoShotCharacter, VideoWorkflow,
} from '@happyvertical/smrt-video';

// Character = virtual persona (outfit, voice, branding)
const anchor = new Character({
  name: 'Bentley News Anchor',
  imageAssetId: 'seed-img-001',     // seeds via character_assets noun join
  voiceProfileId: 'voice-123',      // FK to smrt-voice
  brandingKit: {
    logoAssetId: 'logo-asset',
    primaryColor: '#1a73e8',
    lowerThirdTemplate: 'news-standard',
    tickerEnabled: true,
  },
});
await anchor.save();

// Performer = physical likeness for IP-Adapter face consistency
const performer = new Performer({
  name: 'Alex',
  dna: { gender: 'neutral', ageRange: 'adult', ipAdapterWeight: 0.85 },
});

// Scene = virtual background
const studio = new Scene({
  name: 'News Studio',
  sourceType: 'image',
  projection: 'flat',
});

// Hierarchy: Composition -> Sequence -> Shot (extend Content)
const composition = new VideoComposition({
  title: 'Evening News - March 2, 2026',
  fps: 30,
  width: 1920,
  height: 1080,
});
await composition.save();

const shot = new VideoShot({
  scriptText: 'Welcome to the evening news broadcast.',
  durationInFrames: 900, // 30 sec at 30fps
});
await shot.save();
// estimatedDuration = scriptWordCount / 2.7 words/sec (+/-15%)

// ComfyUI workflow with parameter injection
const workflow = new VideoWorkflow({
  name: 'Wan 2.6 + EchoMimic',
  workflowType: 'broadcast',
  workflowJson: comfyuiApiJson,
  nodeMapping: { seedImage: '1', audioFile: '5', outputVideo: '12' },
  requiredModels: ['wan_2.6_t2v_14b_fp8', 'echomimic_v2'],
});
await workflow.save();

// Deep-clones workflow and overwrites node.inputs
const injected = workflow.injectParameters({
  seedImage: '/path/to/anchor.png',
  audioFile: '/path/to/tts.wav',
});

Core Models

Character (renamed from PersonalityProfile)

Old PersonalityProfile export is preserved for backward compatibility. Scene placement uses scene-specific position / scale configs.

typescript
class Character extends SmrtObject {
  name: string
  imageAssetId?: string       // Seed image FK (via character_assets noun join)
  voiceProfileId?: string     // FK to smrt-voice
  brandingKit?: BrandingConfig // Logo, colors, fonts, lower-thirds
  status: 'pending' | 'ready'
}

Performer (IP-Adapter face consistency)

typescript
class Performer extends SmrtObject {
  name: string
  dna: PerformerDNA           // holds faceEmbedding (512-dim FaceID),
                              // ipAdapterWeight (0.5-1.0), ageRange, etc.
  referenceAssetIds: string[] // Linked via performer_assets noun join
  seedImageAssetId?: string   // Generated seed image asset FK
  voiceProfileId?: string     // FK to smrt-voice
  status: 'pending' | 'ready'
}

Scene

typescript
class Scene extends SmrtObject {
  name: string
  sourceType: 'image' | 'video' | 'panorama_360' | 'panorama_180'
  projection: string
  viewpoints: Viewpoint[]     // pan / tilt / fov
  lightingProfile?: object
  anchorPoints?: object
}

VideoShot (extends Content)

typescript
class VideoShot extends Content {
  scriptText?: string
  scriptWordCount: number
  durationInFrames: number
  videoMetadata?: VideoMetadata  // includes wordTimings for lip-sync
  status: 'draft' | 'queued' | 'processing' | 'ready' | 'failed' | 'published'

  // scriptWordCount / 2.7 (words per second)
  get estimatedDuration(): number
}

VideoSequence + VideoComposition

typescript
class VideoSequence extends Content {
  transitionType: 'none' | 'fade' | 'slide' | 'wipe'
  // position ordering within composition
}

class VideoComposition extends Content {
  fps: number
  width: number
  height: number
  durationInFrames: number
  renderStatus: 'draft' | 'rendering' | 'ready' | 'failed'
  renderProgress: number
}

VideoWorkflow (ComfyUI)

typescript
class VideoWorkflow extends SmrtObject {
  name: string
  workflowType: 'prebake' | 'broadcast' | 'lipsync' | 'postprod' | 'custom'
  workflowJson: object | null // Full ComfyUI API JSON (object, not string)
  nodeMapping: NodeMapping    // Maps semantic names -> node IDs
  requiredModels?: string[]

  // Deep-clones workflow and overwrites node.inputs (warns in dev if inputs missing)
  injectParameters(params: Record<string, any>): object
}

Owned-asset normalisation

Each video noun has its own join table — character_assets, performer_assets, scene_assets — for owner-side asset relationships. VideoShot, VideoSequence, and VideoComposition inherit the content_assets join from their Content base. Generic provenance / "came from" links still use AssetAssociation with role derivation_source.

Run smrt db:migrate before relying on the new noun joins for writes; reads still tolerate the tables being absent during the migration window, and unions legacy STI asset rows for backward compatibility.

Design Principles

  • Store frames, compute seconds: every duration is durationInFrames; seconds = frames / fps
  • Content inheritance: Shot, Sequence, Composition all extend Content so they get governance, transparency, and chat for free
  • Hierarchy: VideoComposition → VideoSequence → VideoShot → VideoShotCharacter
  • Owned asset normalisation: noun joins for Character / Performer / Scene; content_assets via STI parent

Best Practices

DOs

  • Store durations as frames; compute seconds as durationInFrames / fps
  • Use nodeMapping to map semantic names to ComfyUI node IDs
  • Use injectParameters() for safe workflow parameter injection (deep-clones)
  • Estimate speech duration at 2.7 words/second (±15%)
  • Link Character.voiceProfileId to smrt-voice profiles
  • Use the new noun joins (character_assets, …) rather than ad-hoc STI rows

DON'Ts

  • Don't store durations as seconds (use durationInFrames everywhere)
  • Don't assume wordTimings is auto-generated (requires an external TTS provider — see smrt-voice)
  • Don't mutate workflow JSON directly — go through injectParameters()
  • Don't forget trimBeforeFrames / trimAfterFrames in effective frame calculations
  • Don't upload face embeddings through the framework (weight is metadata-only)
  • Don't rely on the legacy STI asset rows after running smrt db:migrate

Related Modules