From locked mix to export — how Veyra actually works
The app is built around two connected surfaces. Project Plan is a guided, linear stepper (sidebar in the product) that walks a song from upload through analysis, world-building, narrative structure, shot scaffolding, prompts, and generation — with state saved per project. Timeline is where those shots land on the beat grid for trimming, playback with optional lyric overlay, and final renders. Together they keep music-video and promo work honest to the track. You can wing it with AI direction from your mix, lyrics, and world plate, or switch to user direction and steer every step — both land on the same timeline and exports.
Plenty of AI video apps take a line of text and vanish into a black box. Veyra is for when the film has to match the song: a deliberate Video Plan with 12 gated stages — from song read and world lock through acts, locations, scaffold, per-shot briefs, filming style, and generation — so you steer the journey. When AI drafts a stage, that output is yours to edit, regenerate, or replace before you lock it; nothing ships until you approve it.
The same control applies to camera grammar on every shot: framing, camera angle, movement, focal-length class, rig, depth of field, and where the camera sits in the space — with one global filming style so the whole sequence still reads as one film, not random prompts.
Thirteen locked steps — no wandering tabs
In the product, routes under /project-plan/… mirror the stepper: each stage unlocks the next with explicit approve / lock semantics where it matters (world, grid, filming style). The database holds a single planner session per project so refresh, collaboration, and exports always reload the same truth.
Assembly room for the cut
After assets exist, the timeline composites stills and clips against the same mix and beat geometry you calibrated in analysis. That is where lip-sync takes, lyric review, and “Save MP4” exports happen — including optional burned-in subtitles when transcripts are present.
AI direction or user direction — same pipeline, your appetite for control
Bring your audio, lyrics, and a world reference — then choose how much you want to steer. Let the model propose structure, beats, and shots from what it hears and sees, or walk every planner step and rewrite each brief until the film matches your taste.
This is not prompt-and-pray: 12 gated Project Plan stages, and every AI draft is editable before you lock it — unlike apps that only take a single prompt and hope the render matches your brief.
One starting line — two modes
Ride the first pass
Lean on transcription, structure, and scaffold outputs from your mix and references — approve gates, generate, and refine in the gallery when the shape feels right. Perfect when you want momentum and a coherent plan without writing every line yourself.
- Fast coherence from audio + lyrics + world seed
- AI proposes each stage; you change any line before locking — not a single mystery output
- Easy to tighten later without redoing the whole pipeline
Dial every facet
Treat the planner as a production room: shape environment, concept, acts, locations, shot scaffold, per-shot prompts, camera grammar, continuity, and global filming style — as detailed as you want. Nothing is final until you lock it.
- Full creative control across all 12 Project Plan stages — not one prompt box pretending to read your mind
- Explicit locks so the look cannot drift mid-project
- Ideal when the vision is specific and every frame counts
Both modes feed the same beat-synced timeline and exports — pick AI-assisted drafts for speed, full manual direction for precision, or blend across phases. Project Plan steps below →
Project Plan — step by step
The following mirrors the in-app sidebar order: foundations first, then narrative and locations, then shot mechanics, then generation. Copy here is written for producers and directors; engineers can still rely on the same gates in the product.
Foundations — song & grid
Every project starts from your mix. You lock what the rest of the planner is allowed to assume, then run analysis so structure, lyrics, and later steps all share one timeline.
- 1. Inputs
Upload the master audio, name the project, and optionally attach reference stills: an environment plate, cast / performer references, optional pasted lyrics, and per-actor notes (look lock, personality). Nothing generative yet — you are anchoring the creative brief in files the later steps inherit.
- 2. Audio analysis
Run the analyser on the locked mix: BPM, meter hints, segments, downbeats, and calibration tools. When you lock the beat grid, downstream steps (acts, shot timing) snap to the same musical time — so the plan cannot drift from what you hear.
World & narrative
Once the grid exists, the app helps you agree on the film before anyone talks about lenses: what the song means, where it lives visually, and how the story maps onto sections of the track.
- 3. Song read
An AI pass reads lyrics plus your cast references and writes a concise interpretation: themes, tone, palette hints, era tags — the shared vocabulary directors and producers use before shot language.
- 4. Environment prompt
The system proposes an image-generation seed for the world; you edit it until it reads right. That prompt is the seed every environment candidate is generated from — the world is locked before concept work hardens.
- 5. Environment chat
Chat with the assistant to iterate environment candidates, upload your own plates, and pick the look the video lives in. Every later frame must respect the choice you lock here.
- 6. Concept
With the world fixed, the model writes a proper treatment: protagonist arc, narrative beats per song section, recurring motifs. This is the document-level spine your shot list will hang from.
- 7. Acts
Map narrative acts onto song sections. Each act gets prose and a downbeat-aligned start/end on the timeline so structure stays musical, not arbitrary wall-clock blocks.
- 8. Locations
Subdivide the locked world into distinct places the video cuts between — rooftop, corridor, stage, exterior, and so on. Each location carries prose plus a reference plate so every shot set there inherits the same spatial logic.
Shot authoring
Here the plan becomes concrete: a beat-aligned shot scaffold, per-shot generation briefs, and one global filming style so renders feel like one film, not thirteen unrelated prompts.
- 9. Shot scaffold
Generate the full shot list with timing, character focus, framing, angle, movement, lens class, rig, depth of field, and camera position in space — aligned to beats, without final prose prompts yet. Locations are drawn only from the locked location whitelist so geography stays coherent.
- 10. Shot direction
Fill or refine prompt, action, continuity anchors, and negatives per shot. Each bundle automatically includes the locked environment and actor references so regens do not silently drift from approved look.
- 11. Filming style
Choose one global filming style (for example a named cinematic reference). Its text is appended to every still and video generation on top of the per-shot brief — the main lever for consistent colour, grain, and lens culture across the whole sequence.
Produce — stills, video, iteration
Heavy batch controls, project-wide generate notes, and assistant help live next to the gallery, where you curate outputs and reopen any shot for targeted regeneration.
- 12. Generate
Batch missing assets, run one-off stills or clips, attach project-wide generate notes, and use render-help chat when you are stuck. This step is the control room; the grid is intentionally paired with Gallery for deep per-shot work.
- 13. Gallery
Browse renders in a grid, open any shot for full-screen preview, walk prev/next, inspect prompts, and queue regens with per-shot notes. Batch queues can stay on Generate while you polish winners here.
Timeline — when shots become a video
The timeline is not a generic NLE: it is aware of your downbeats, shot boundaries, and lyric timing from the same project record Project Plan wrote. That is what makes beat-snapped MV work and lyric-forward reviews feel native instead of bolted on.
One time domain
The timeline lays the waveform, a beat and downbeat rail, optional staging markers from the plan, and one track row per shot — all in the same musical clock as Project Plan. What you scrub is what the song is doing.
Beat-accurate editing
Drag clips to slide on the grid with snap, trim in or out to change a shot’s beat window, lock a shot when it is approved, or mute it to preview the cut without that angle. Edits persist back to the project so the database stays the single source of truth.
Preview & lyrics
The transport column plays the composite with timecode and mix level. A subtitles mode draws lyric karaoke over the preview when a transcript exists: one line at a time, per-word emphasis, reset cleanly when you cross shots or lyric lines so reviews read like a real lyric video pass.
Export
Render stitches the laid-out clips to the selected mix. You can save a clean master or request burned-in lyrics when timing allows — including karaoke-style ASS when possible, with sensible fallbacks. Sidecar subtitles can still land next to the file for external finishing.
Timeline — assembly to master
- 1One time domain
Waveform, downbeat rail, and one lane per shot — the same musical clock you locked in Project Plan.
- 2Beat-accurate editing
Slide and trim on the grid with snap; lock or mute shots. Every edit writes back to the project record.
- 3Preview & lyrics
Play the composite with timecode; turn on lyric karaoke when a transcript exists so reviews match the lyric video.
- 4Export
Stitch the laid-out cut to your mix — clean master, burned-in lyrics when timing allows, plus sidecar subs for finishing.
Project Plan ends at Gallery. Timeline picks up the approved stills and clips for the cut, review passes, and the render that ships.
Why this shape wins for music-driven work
- Beat truth first. Acts, scaffolds, and the timeline all consume the same analysis output, so you are never manually nudging twelve tools to agree on where bar one landed.
- Locks, not vibes. Environment, grid, locations, and global filming style are explicit commit points — the model is not allowed to silently change the world halfway through shot prompts.
- Shot cards stay thick. Scaffold captures direction grammar (lens, rig, position in space) before prose prompts, which keeps diversity high without every shot reading like the same paragraph with synonyms swapped.
- Review where editors think. Gallery and timeline are where teams say yes or no on frames and on the cut; exports line up with what was played back, modulo normal mix latency hygiene.




