This is where you tell the app which recording, which people, which still images, and how long the editorial video is every later step should trust. Nothing fancy happens here yet — you’re making choices you can lock before beats, story, acts, and renders build on top.

What this step is for

Get the right audio file, reference faces and outfits, optional style / world stills, and a timeline basis (full song vs fixed seconds vs length from narrative), then lock so timing and picture work don’t drift.

Project title

Name the project in plain language (for example a working title). The studio header and project list use this title, not the uploaded audio filename — so every row reads like a deliberate project, not “bounce_final_v3.mp3”.

How long is the video?

Under How long is the video?, choose a Timeline basis. This decides what Acts and Shot scaffold use as their clock — not how loud the mix is on the Timeline export, but how many seconds of story structure you are planning.

Music videos usually keep the default. Promo cuts, trailers, and narrative-led pieces often pin a shorter runtime here.

Timeline basis (menu)	Best for	What Acts / scaffold use
Full mix (song structure drives Acts)	Full music videos; structure should follow the uploaded song	Duration and sections from Audio analysis (verse/chorus, downbeats, etc.)
Fixed length (seconds)	A target runtime you already know (promo, social cut, trailer)	Your number between 10 and 900 seconds; the app builds a simple act grid on that clock (synthetic sections, even snap rails) instead of the full waveform
From narrative — estimate in Song read	Story-first; length should follow the treatment you pick in Song read	Seconds filled in Step 3 via Estimate from narrative, or typed here while Inputs is unlocked; same synthetic act grid as fixed length until you change basis

Full mix (default)

Leave Timeline basis on Full mix. After Audio analysis is trustworthy, act boundaries and shot timing follow real song sections and beats. This is the normal path for a full MV.

Fixed length (seconds)

Choose Fixed length (seconds) and enter how long the video should be (10–900). You must have a valid number before Acts — the UI will block structural steps if it’s missing. Audio analysis still runs on your mix for lyrics and preview, but Acts and scaffold clamp to your seconds, not the full file length.

From narrative — estimate in Song read

Choose From narrative — estimate in Song read when runtime should come from the story direction you lock in Song read:

Lock Inputs with this basis (seconds can stay blank).
In Song read, pick one narrative direction.
Press Estimate from narrative (or unlock Inputs and type seconds yourself).
Approve Song read only after a runtime is set — locking Song read is blocked until seconds exist.

You can override the estimate anytime by unlocking Inputs and editing Seconds.

Vocal performance shots (full mix only)

When Timeline basis is Full mix, you can set Vocal performance shots — how often the planner should place lip-sync / singing-to-camera moments (None, Subtle, Balanced, Performance-heavy, or Custom %). This does not mean every line is a performance shot; the scaffold places them at hooks and key lyrics. Default is Balanced.

Audio

Add the mixed track everyone will judge timing against:

Upload file — MP3, WAV, M4A, and other common formats.
SUNO URL — paste a Suno song page URL (the app extracts the MP3) or a direct Suno MP3 link.

When you unlock Inputs later, you can keep the current mix, upload a replacement, or paste a new Suno URL.

Environment reference (optional)

Upload a still that seeds world look in Environment chat. You can also pick from the project gallery once the project exists.

After Step 1 is locked, unlocking Inputs lets you optionally run Generate 6 orbit views on the saved seed — six full-frame angles around the same environment (slow; optional stronger image model via env config).

Reference lyrics (optional)

Paste lyrics one sung line per row, in song order. Audio analysis still transcribes the vocals, then compares each line to the same line here (spacing/case-insensitive). Leave blank if you don’t need compare tools or lyric-led QA.

If everything pastes as one paragraph, fix line breaks — the form warns when breaks are probably missing.

Actor refs

Add one row per character:

Label — the name prompts use everywhere (for example “lead singer”).
Portrait — clear face and light; upload or choose from gallery.
Look / face / wardrobe lock — hard continuity (for example “always dark sunglasses; never show bare eyes”).
Personality (fixed) — demeanor line for prompts.
Roles (full mix only) — solid green square checkboxes: Lead character, Vocalist, Dancer, Band member, Background actor. Mark Vocalist on anyone who may appear in vocal performance shots; other roles help the planner understand cast coverage.
Suggest with AI — optional starting text for lock + personality from the portrait.

After you lock Step 1, the app can auto-generate a six-view turnaround sheet per actor (when image AI is configured). Portrait + sheet ship together as references in later generates. While unlocked, you can regenerate turnaround per row.

Continuity reminders also appear in Generate with a shortcut back to Inputs.

What to do

Open Project Plan → Inputs.
Set project title and timeline basis (and seconds if not using full mix).
Add audio (upload or Suno URL).
Add actor rows with portraits; fill look lock / personality if you need hard continuity.
Optionally add environment and reference lyrics.
Listen once; confirm uploads match intent.
Press Save & lock inputs (wording may vary slightly on screen).

Screenshot of Step 1

Click the image for the full-page capture. Scroll to zoom and drag to pan (same on other Project Plan steps below).

If you change things later

Unlock Inputs only when you need a different mix, cast, runtime basis, or references. After edits, lock again before you rely on analysis or new structural work — otherwise Acts, scaffold, and older renders may no longer match.

Changing timeline basis or seconds after Acts are approved usually means revisiting Acts and scaffold.

Audio analysis — beats and lyrics on your mix (still required for full-mix structure; also powers transcription and Timeline lyrics even when you use a fixed runtime).

1. Inputs