Google Flow: How to create a video from the ground up – from Prompt to Export

◷ 6 min read 6/11/2026

Main chat

A chat for vibe coders: news, guides, live cases, marketplace, and finding executors.

Google Flow: How to create a video from the ground up – from Prompt to Export - обложка

What it takes to get started

Flow runs on three Google models: Veo 3 is responsible for video generation, Imagen is responsible for images, and Gemini is responsible for understanding text instructions.

Access through a Google AI subscription. Two options:

AI Pro – $19.99/month, basic access to Flow, Veo 2
AI Ultra – $249.99/month, full access to Veo 3 and Veo 3.1 with audio generation, 25,000 AI bonuses per month

Bonuses (credits) are spent on generation: one scene costs 10 to 100 credits depending on the model chosen and the number of options. Unused loans are not carried over to the next month.

Three generation modes

Flow offers three entry points depending on where you start:

*Text to Video is a text-only prompt. Suitable for first experiments and scenes where no binding to a specific visual reference is needed.

Frames to Video – you set the first and/or last frame, Flow generates a transition between them. Use when you need to control the beginning and end of movement: for example, a close-up of the face → the departure of the camera to the general plan of the street.

Ingredients to Video - upload reference images or videos as "ingredients" of the scene. Flow takes visual elements from them – character, object, style – and builds them into generation. This is the main way to keep the character consistent between different scenes.

How to write prompts

A short prompt is the main reason for a poor result. Flow interprets uncertainty literally: wrote "man walks" - got a silhouette in neutral space.

Well-functioning structure of the prompt:

plaintext

[subject + action] + [environment] + [camera movement] + [lighting/atmosphere] + [style/quality]

An example of a weak prompt:

plaintext

girl walking down the street

Example of a worker:

plaintext

a young woman in a linen coat walks down an empty cobbled street in the old town.
the camera slowly follows it, golden hour, soft long shadows,
24fps cinematic quality, atmosphere of european urban cinema

**Camera motion is a separate parameter that is often overlooked. Flow understands the standard operator terms: pan left/right (horizontal panorama), tilt up/down (tilt), dolly in/out (hit/departure), tracking shot (following an object), static shot (static camera).

Sound describe directly in the prompt if needed (available on Veo 3): ambient city noise, dialogue: woman says quietly "I'll be back", soft wind, distant church bells. The model generates audio synchronously with the video sequence.

Generation settings

On the Prompt page there are four key parameters:

**Model: Veo 2 Fast (cheap, fast, lower quality), Veo 2 Quality (balance), Veo 3 Highest Quality (maximum detail and sound)
** Number of options**: 1-4. For the first iteration, 2 is enough to choose the direction
**End ratio: 16:9 for horizontal, 9:16 for vertical (Reels/Shorts)
Duration: baseline generation – 4-8 seconds on stage

One generation on the Veo 3 with two variants consumes about 60-80 credits.

Assembling several scenes in a video

Flow allows you to combine scenes into a single video right inside the platform. Logic of work:

Generate each scene separately through New Scene in one project
Use Ingredients to save the character: download the stop frame from the previous scene as a reference - so the model retains the appearance and style
Arrange the scenes on the timeline in the right order
If necessary, use Frames to Video to generate a smooth transition between two scenes

To expand the finished clip: open the scene → Extend → set a prompt for the continuation. So you can bring the scene to 30-60 seconds in iterative steps.

Flow TV: How to Use Learning

Built-in Flow TV is not just a results gallery. Every video there comes with the full prompt and settings that created it. Algorithm to work with her:

Find a video with the visual language you need
Copy the structure of the prompt - not the description itself, but its frame: how the subject is set, what camera movement, what style
Replace your content with the same structure
Compare the result with the original, adjust

This is faster than experimenting from scratch.

Typical problems

** Character changes from scene to scene.** Solution: always use Ingredients with a reference frame from the previous scene. Without it, the model generates again every time.

Video looks like stock footage, not a movie. Reason: The prompt doesn't have camera movement and no stylistic markers. Add the camera motion type and at least one visual reference (в стиле A24, Kubrick-like symmetry, handheld documentary feel).

**The sound doesn't match the picture. ** This Veo 3 limitation is that speech synchronization with articulation is unstable. For video with dialogue, it is better to generate only a video sequence, and add voiceover separately in the video editor.

Loans run out in the middle of the project. Loans are not carried over, but are renewed at the beginning of a new payment cycle. Count approximately: 2 variants on Veo 3 = ~70–80 credits; at a limit of 25,000/month, it is about 300 scenes on Ultra.

Outcome

plaintext

Registered on flow.google.com with an AI Pro or AI Ultra subscription
● Prompt contains subject, environment, camera movement and style
● The constant character uses Ingredients with reference.
Selected model for the task: Veo 2 Fast for drafts, Veo 3 for the final
● Scenes collected on a timeline in one project
Flow TV is used at least once as a source of the structure of the prompt