# V3 Facial Expression System — Implementation Plan

> This document is the living plan for V3 development. Updated continuously throughout the pipeline.

## Overview

V3 is a **JavaScript control layer** on top of the existing V2 ONNX model. No retraining needed. The V2 model stays frozen.

```
User/LLM Emotion Input (e.g., "gratitude")
     |
     v
[1. Emotion Lexicon]        emotion name → VAD coordinate
     |
     v
[2. VAD → MEAD Mapper]      VAD → 5-dim MEAD vector (RBF proximity)
     |
     v
[3. Emotion Timeline]       smooth cross-fade on conditioning signal
     |
     v
[4. Expression Strength]    scale deviation from neutral
     |
     v
[5. Feed to V2 Model]       lipsync.setEmotion(smoothedVec)
     |
     v
[V2 ONNX — unchanged]       produces 52 blendshapes (lip sync + base expression)
     |
     v
[6. VAD Blendshape Overlay]  additive upper-face modifiers for non-MEAD emotions
     |
     v
[Final 52-dim frame → avatar]
```

---

## MEAD Anchor Points in VAD Space

| Emotion | V | A | D |
|---------|------|------|------|
| neutral | 0.50 | 0.30 | 0.50 |
| joy | 0.87 | 0.72 | 0.72 |
| anger | 0.17 | 0.82 | 0.80 |
| sadness | 0.17 | 0.30 | 0.28 |
| surprise | 0.55 | 0.82 | 0.42 |

Note: neutral arousal = 0.30 (not 0.50). A resting face is slightly deactivated.

---

## Implementation Steps

### Phase 1: Foundation (parallel, no dependencies)

#### Step 1 — Emotion-VAD Lexicon
- **File**: `src/v3/emotion-vad-lexicon.js`
- **What**: Static lookup: emotion name → `{v, a, d}`
- **Data**: 27 emotions + alias map (happy→joy, scared→fear, etc.)
- **API**: `lookupEmotion(name)`, `resolveAlias(name)`, `listEmotions()`
- **Complexity**: Simple

#### Step 7 — Data Files (parallel with Step 1)
- `src/v3/data/emotion-archetypes.js` — 22 emotion blendshape target vectors (52-dim each)
- `src/v3/data/vad-parametric-config.js` — per-blendshape VAD response curves
- `src/v3/data/arkit-regions.js` — blendshape index classification (MOUTH vs UPPER_FACE vs BLINK)
- `src/v3/v3-config.js` — default config values
- **Complexity**: Medium (tedious transcription from research docs)

### Phase 2: Core Pipeline (sequential)

#### Step 2 — VAD-to-MEAD Mapper
- **File**: `src/v3/vad-mead-mapper.js`
- **What**: `vadToMead({v, a, d}) → Float32Array(5)`
- **Algorithm**: Gaussian RBF proximity weighting with dominance shaping gates
  - `weight_i = exp(-dist_i^2 / (2 * sigma^2))`, sigma=0.30
  - Dominance gates: anger gated by D>0.3, sadness gated by D<0.7
  - Neutral = inverse of total emotional activation (floored at 0.15)
- **Depends on**: Step 1
- **Complexity**: Simple-Medium

#### Step 4 — Expression Strength
- **File**: `src/v3/expression-strength.js`
- **What**: Scale VAD deviation from neutral: `scaled = 0.5 + (vad - 0.5) * strength`
- **Range**: 0 (deadpan) → 1 (normal) → 2 (exaggerated)
- **Complexity**: Simple

#### Step 3 — Emotion Timeline (Cross-Fade)
- **File**: `src/v3/emotion-timeline.js`
- **What**: Stateful per-frame smoother. Cross-fades conditioning signal at VAD level.
- **Algorithm**: Asymmetric exponential smoothing
  - Onset tau: 0.15s (fast emotional onset)
  - Offset tau: 0.40s (slow emotional decay)
  - Arousal-dependent speed: high arousal = faster transitions
- **Key decision**: Cross-fade at **VAD level**, not MEAD level. MEAD-level interpolation between joy and sadness produces a grimace. VAD-level passes through neutral naturally.
- **Default transition**: 15 frames (500ms at 30fps) with smoothstep
- **Depends on**: Steps 1, 2, 4
- **Complexity**: Medium

#### Step 6 — VAD Blendshape Overlay
- **File**: `src/v3/vad-blendshape-overlay.js`
- **What**: Additive upper-face modifiers for emotions the V2 model wasn't trained on
- **Model handles**: lip sync, joy, anger, sadness, surprise, neutral
- **Overlay patches**: brow, eye squint/wide, cheek squint, nose sneer (for fear, disgust, gratitude, etc.)
- **Overlay does NOT touch**: jaw, mouth, tongue (lip sync owned)
- **Architecture**: 60% parametric (VAD→blendshape curves) + 40% archetype (per-emotion targets)
- **Smart scaling**: overlay strength inversely proportional to MEAD anchor proximity
  - Joy → ~0% overlay (model handles natively)
  - Fear → ~80% overlay (model needs help)
  - Gratitude → ~60% overlay
- **Depends on**: Step 7 data files
- **Complexity**: Complex (most labor-intensive step)

### Phase 3: Integration

#### Step 5 — V3 Controller
- **File**: `src/v3/v3-emotion-controller.js`
- **What**: Orchestrator class that ties Steps 1-4, 6 together
- **API**:
  ```js
  v3.setEmotion("gratitude")     // set target (smoothly transitions)
  v3.setStrength(1.5)            // set expressiveness
  v3.update(dt)                  // call every frame BEFORE audio processing
  v3.applyOverlay(modelFrame)    // call AFTER getting model output
  ```
- **Render loop integration**:
  ```js
  // Before: static emotion from sliders
  // After:
  const overlay = v3Controller.update(dt);
  // ... processAudioChunk / frame application ...
  const enhanced = v3Controller.applyOverlay(arkitFrame);
  applyArkitBlendshapes(enhanced);
  ```
- **Backward compatible**: V2 works exactly as before if V3 controller is not instantiated
- **Depends on**: Steps 1-4, 6
- **Complexity**: Medium

#### Step 8 — UI Integration
- **File**: modify `examples/guide/index.html`
- **What**: Emotion picker (22+ presets), strength slider, transition speed control
- **Keep**: existing 5-dim MEAD sliders as "Advanced/Manual" mode
- **Depends on**: Step 5
- **Complexity**: Medium

#### Step 9 — Emotion Scheduler (LLM/TTS hook)
- **File**: `src/v3/emotion-scheduler.js`
- **What**: Schedule emotion changes at specific timestamps for TTS playback
- **Usage**: `scheduler.schedule(2.1, "embarrassment"); scheduler.start(audioCtx.currentTime);`
- **Depends on**: Step 5
- **Complexity**: Simple-Medium

---

## File Structure

```
src/v3/
  emotion-vad-lexicon.js        [Step 1]
  vad-mead-mapper.js            [Step 2]
  emotion-timeline.js           [Step 3]
  expression-strength.js        [Step 4]
  v3-emotion-controller.js      [Step 5]
  vad-blendshape-overlay.js     [Step 6]
  v3-config.js                  [Step 7D]
  emotion-scheduler.js          [Step 9]
  data/
    emotion-archetypes.js       [Step 7A]
    vad-parametric-config.js    [Step 7B]
    arkit-regions.js            [Step 7C]
```

---

## End-to-End Example: "gratitude"

1. `v3.setEmotion("gratitude")`
2. **Lexicon**: → `{v:0.85, a:0.45, d:0.52}`
3. **Timeline**: smoothly interpolates from current VAD to target over ~500ms
4. **Strength**: at 1.0, no change
5. **MEAD mapper**: → `[0.15, 0.70, 0.01, 0.01, 0.13]` (mostly joy + some neutral)
6. **V2 model**: produces smile shapes + lip sync from audio
7. **Overlay**: gratitude is dist ~0.27 from joy anchor → ~90% overlay intensity. Adds browInnerUp ~0.30, eyeSquint ~0.40, cheekSquint ~0.35 (the "warm eyes" that distinguish gratitude from plain joy)
8. **Result**: warm smile (model) + raised inner brows and soft eyes (overlay) = gratitude

---

## What Does NOT Need Retraining

Everything in this plan. The V2 ONNX model stays frozen.

## What Would Benefit from Retraining (V4+)

- Training on more emotion categories (fear, disgust, contempt) → eliminates overlay need
- VAD conditioning instead of categorical → model learns continuous emotion space
- Transition sequence training → natural micro-expressions during emotion changes
- Richer datasets beyond MEAD

---

## Status Tracker

| Step | Status | Notes |
|------|--------|-------|
| 1. Emotion Lexicon | Not started | |
| 2. VAD→MEAD Mapper | Not started | |
| 3. Emotion Timeline | Not started | |
| 4. Expression Strength | Not started | |
| 5. V3 Controller | Not started | |
| 6. VAD Overlay | Not started | |
| 7. Data Files | Not started | |
| 8. UI Integration | Not started | |
| 9. Emotion Scheduler | Not started | |

---

## Research Documents

- `docs/research/vad-to-arkit-blendshape-mapping.md` — VAD dimensions → facial regions → ARKit blendshapes
- `docs/research/emotion-blendshape-patterns.md` — Per-emotion blendshape activation patterns (20 emotions)
- `docs/research/vad-to-mead-emotion-mapping.md` — VAD↔MEAD mapping algorithm, extended emotion table, temporal crossfade design
