Inside the product

From locked mix to export — how Veyra actually works

The app is built around two connected surfaces. Video Plan is a guided, linear stepper (sidebar in the product) that walks a song from upload through analysis, world-building, narrative structure, shot scaffolding, prompts, and generation — with state saved per project. Timeline is where those shots land on the beat grid for trimming, playback with optional lyric overlay, and final renders. Together they keep music-video and promo work honest to the track.

Feature tour →AI models →Book a demo

Video Plan

Thirteen locked steps — no wandering tabs

In the product, routes under /video-plan/… mirror the stepper: each stage unlocks the next with explicit approve / lock semantics where it matters (world, grid, filming style). The database holds a single planner session per project so refresh, collaboration, and exports always reload the same truth.

Timeline

Assembly room for the cut

After assets exist, the timeline composites stills and clips against the same mix and beat geometry you calibrated in analysis. That is where lip-sync takes, lyric review, and “Save MP4” exports happen — including optional burned-in subtitles when transcripts are present.

Flow at a glance:Beat grid & lyricsWorld + story + locationsShot scaffold & promptsGenerate → Gallery → Timeline → Export

Video Plan — step by step

The following mirrors the in-app sidebar order: foundations first, then narrative and locations, then shot mechanics, then generation. Copy here is written for producers and directors; engineers can still rely on the same gates in the product.

Foundations — song & grid

Every project starts from your mix. You lock what the rest of the planner is allowed to assume, then run analysis so structure, lyrics, and later steps all share one timeline.

1. Inputs
Upload the master audio, name the project, and optionally attach reference stills: an environment plate, cast / performer references, optional pasted lyrics, and per-actor notes (look lock, personality). Nothing generative yet — you are anchoring the creative brief in files the later steps inherit.
2. Audio analysis
Run the analyser on the locked mix: BPM, meter hints, segments, downbeats, and calibration tools. When you lock the beat grid, downstream steps (acts, shot timing) snap to the same musical time — so the plan cannot drift from what you hear.

World & narrative

Once the grid exists, the app helps you agree on the film before anyone talks about lenses: what the song means, where it lives visually, and how the story maps onto sections of the track.

3. Song read
An AI pass reads lyrics plus your cast references and writes a concise interpretation: themes, tone, palette hints, era tags — the shared vocabulary directors and producers use before shot language.
4. Environment prompt
The system proposes an image-generation seed for the world; you edit it until it reads right. That prompt is the seed every environment candidate is generated from — the world is locked before concept work hardens.
5. Environment chat
Chat with the assistant to iterate environment candidates, upload your own plates, and pick the look the video lives in. Every later frame must respect the choice you lock here.
6. Concept
With the world fixed, the model writes a proper treatment: protagonist arc, narrative beats per song section, recurring motifs. This is the document-level spine your shot list will hang from.
7. Acts
Map narrative acts onto song sections. Each act gets prose and a downbeat-aligned start/end on the timeline so structure stays musical, not arbitrary wall-clock blocks.
8. Locations
Subdivide the locked world into distinct places the video cuts between — rooftop, corridor, stage, exterior, and so on. Each location carries prose plus a reference plate so every shot set there inherits the same spatial logic.

Shot authoring

Here the plan becomes concrete: a beat-aligned shot scaffold, per-shot generation briefs, and one global filming style so renders feel like one film, not thirteen unrelated prompts.

9. Shot scaffold
Generate the full shot list with timing, character focus, framing, angle, movement, lens class, rig, depth of field, and camera position in space — aligned to beats, without final prose prompts yet. Locations are drawn only from the locked location whitelist so geography stays coherent.
10. Per-shot prompts
Fill or refine prompt, action, continuity anchors, and negatives per shot. Each bundle automatically includes the locked environment and actor references so regens do not silently drift from approved look.
11. Filming style
Choose one global filming style (for example a named cinematic reference). Its text is appended to every still and video generation on top of the per-shot brief — the main lever for consistent colour, grain, and lens culture across the whole sequence.

Produce — stills, video, iteration

Heavy batch controls, project-wide generate notes, and assistant help live next to the gallery, where you curate outputs and reopen any shot for targeted regeneration.

12. Generate
Batch missing assets, run one-off stills or clips, attach project-wide generate notes, and use render-help chat when you are stuck. This step is the control room; the grid is intentionally paired with Gallery for deep per-shot work.
13. Gallery
Browse renders in a grid, open any shot for full-screen preview, walk prev/next, inspect prompts, and queue regens with per-shot notes. Batch queues can stay on Generate while you polish winners here.

Timeline — when shots become a video

The timeline is not a generic NLE: it is aware of your downbeats, shot boundaries, and lyric timing from the same project record Video Plan wrote. That is what makes beat-snapped MV work and lyric-forward reviews feel native instead of bolted on.

One time domain
The timeline lays the waveform, a beat and downbeat rail, optional staging markers from the plan, and one track row per shot — all in the same musical clock as Video Plan. What you scrub is what the song is doing.
Beat-accurate editing
Drag clips to slide on the grid with snap, trim in or out to change a shot’s beat window, lock a shot when it is approved, or mute it to preview the cut without that angle. Edits persist back to the project so the database stays the single source of truth.
Preview & lyrics
The transport column plays the composite with timecode and mix level. A subtitles mode draws lyric karaoke over the preview when a transcript exists: one line at a time, per-word emphasis, reset cleanly when you cross shots or lyric lines so reviews read like a real lyric video pass.
Export
Render stitches the laid-out clips to the selected mix. You can save a clean master or request burned-in lyrics when timing allows — including karaoke-style ASS when possible, with sensible fallbacks. Sidecar subtitles can still land next to the file for external finishing.

Timeline — assembly to master

1One time domain
Waveform, downbeat rail, and one lane per shot — the same musical clock you locked in Video Plan.
2Beat-accurate editing
Slide and trim on the grid with snap; lock or mute shots. Every edit writes back to the project record.
3Preview & lyrics
Play the composite with timecode; turn on lyric karaoke when a transcript exists so reviews match the lyric video.
4Export
Stitch the laid-out cut to your mix — clean master, burned-in lyrics when timing allows, plus sidecar subs for finishing.

Video Plan ends at Gallery. Timeline picks up the approved stills and clips for the cut, review passes, and the render that ships.

Why this shape wins for music-driven work

Beat truth first. Acts, scaffolds, and the timeline all consume the same analysis output, so you are never manually nudging twelve tools to agree on where bar one landed.
Locks, not vibes. Environment, grid, locations, and global filming style are explicit commit points — the model is not allowed to silently change the world halfway through shot prompts.
Shot cards stay thick. Scaffold captures direction grammar (lens, rig, position in space) before prose prompts, which keeps diversity high without every shot reading like the same paragraph with synonyms swapped.
Review where editors think. Gallery and timeline are where teams say yes or no on frames and on the cut; exports line up with what was played back, modulo normal mix latency hygiene.