TheAIgency
Back to blog

Building a UGC ad in 90 minutes with Seedance + Remotion (with the prompts)

9 May 2026 · 7 min read · TheAIgency

TL;DR. The fal.ai + Remotion pipeline produces a vertical 9:16 UGC ad in ~90 minutes including review. Stack: GPT-Image for the persona, Seedance 2.0 for the motion, fal AI voiceover, fal lip-sync, Remotion for composition + captions + B-roll. Below: the actual order of operations and the prompts we use as starting points. This is the workflow our Producer + Editor agents run inside Cockpit for the Creative product line.

Tools used

StageToolPurpose
1. PersonaGPT-Image (OpenAI via fal.ai)Hero shot of the AI creator (multiple angles)
2. MotionSeedance 2.0 (fal.ai)3-6s talking-head clips
3. Voicefal AI TTSVoiceover matched to script + accent
4. Lip-syncfal lip-sync modelMouth movement aligned to voice
5. B-rollSeedance / Veo (fal.ai)Product close-ups, text-only inserts
6. CompositionRemotionCuts, captions, music, transitions, export

The 90-minute walkthrough

  1. Min 0-10. Brief + script. Locked-down: one hook (first 1.5s), one problem statement, one product reveal, one CTA. 30-45 seconds total. The script is written by Cockpit's Copywriter agent or by hand — both work.
  2. Min 10-20. Persona generation. GPT-Image prompt template:
    Scene: [setting that matches the audience — kitchen / desk / car / cafe], natural light.
    Subject: a [age range] [gender] [region] person looking at camera, holding a smartphone.
    Important: photoreal, casual outfit, no makeup, no logos visible. Slight smile. Eye contact with camera.
    Use case: hero frame for vertical UGC ad.
    Constraints: no text, no overlays, single person, neutral background. 9:16 aspect.
    Generate 4 variants. Pick one.
  3. Min 20-40. Motion generation. Seedance prompt:
    The person from the reference image talks naturally to the camera, gesturing slightly with their free hand. Subtle head movement. Hold the camera angle. Length: 6 seconds.
    Run 3-4 takes. Pick the most natural.
  4. Min 40-50. Voiceover. Run the script through fal TTS with the matching accent (FR-MA, FR-FR, EN-MENA, etc.). Generate 2-3 takes.
  5. Min 50-65. Lip-sync. Feed the chosen Seedance clip + the chosen voiceover into the fal lip-sync model. Output is the talking-head clip with mouth aligned.
  6. Min 65-80. B-roll. Generate product close-ups via Seedance (3-second shots of the product from different angles). Add Remotion text inserts for the hook line and CTA.
  7. Min 80-90. Compose + export. Drop everything in Remotion: hook → talking-head → B-roll → CTA. Add captions (we use a hand-tuned word-by-word style — performs better than auto-bouncing styles in MENA testing). Music from a licensed library. Export 4K, 1080p, and a 720p version for fast distribution.

What breaks

  • Lip-sync drift. If the voiceover is >6s, the model loses sync. Cut the script tighter.
  • Persona consistency across clips. Use the same reference image for every motion generation in a campaign. Don't re-roll the persona.
  • Music levels. AI-generated voice is quiet by default. Boost +6 dB before mixing or the music drowns it.
  • Caption timing. Auto-caption tools mistime first-second hooks. Hand-time the first 1.5s.

Output expectations

One 30-45s ad with one persona, two B-roll inserts, captions, music, three platform exports. Margin for ~2 revisions stays inside the 90-minute window. For a full campaign (8+ ads with persona + script variety), budget 1 day of focused work or — easier — book a UGC Sprint and we ship it.

If you want this

Sprint pack: UGC Sprint (5-10 videos in a week, €1,000–2,500). Monthly drumbeat: Series (4-8 videos/month, €2,000–4,000/month). Send a brief with your offer + audience and our proposal generator scopes it.

Ready to start?

Generate your proposal in 60 seconds — free, no commitment.

Start a project
ready when you are