Lessons Learned
This page records non-obvious behaviour and follow-up work that are easy to lose across sessions. It complements Beat Service. End-user behaviour for the Audio analysis step lives in the user guide (separate section in /docs).
Lyric word timing (Whisper)
Context. Lyric transcription uses faster-whisper with word_timestamps=True on the Demucs vocal stem. Persisted words carry start_sec / end_sec in mix‑native seconds, consistent with beats and the waveform.
Lesson. Those boundaries are inherently approximate. Whisper’s word alignment is not sample-accurate. On fast or dense vocals (quick syllables, runs, rap), word ends and starts can sit noticeably early or late versus what you hear. That affects line audition, seek-to-word behaviour anywhere it exists, and any future feature that assumes frame-tight lyrics.
Mitigations in the app today. The beat service can nudge word starts on the Demucs stem to the first above-threshold energy frame after Whisper (see Beat Service); end times are unchanged unless segment bounds are recomputed from words. The Audio Analysis pane also adjusts snippet playback (latest word end, currentTime stop, tail pad). None of this replaces a dedicated phoneme aligner for karaoke-tight timing.
TODO: tighter timing when it matters
- Evaluate a post-Whisper alignment stage once transcript text is fixed — e.g. forced alignment, WhisperX-style phoneme alignment, or another wav2vec/CTC-based aligner — to refine word (or segment) boundaries when karaoke-tight timing is required.
- User-facing expectations for Whisper-only timing are documented in the user guide’s Audio analysis page (sections, compare UI, ASR % vs match cues, audition).
Pragmatic default. For many songs, current Whisper timing is acceptable. For fast vocals or shot-level lyric sync, plan on either living with drift or investing in the alignment pipeline above — polish in React alone cannot fix the underlying model limits.
Gallery “All renders” hero strip
Context. Shot detail in Gallery shows a row of still thumbnails with tick boxes to set the hero JPG.
Lesson. This must stay a frozen, deduped list with only the tick moving. Rebuilding the list from a live disk index after each hero change, disabling ticks on the selected tile, or reverting the tick when project-still-lock fails all produced “extra image”, “can’t go back to a previous take”, and strip flash bugs.
Locked spec. See Gallery hero strip before changing GalleryStillTakesStrip, GalleryRenderStripTile, listGalleryStripTakes, or project-still-lock.
