Video ModelAlibaba

Wan 2.6

Alibaba's Wan 2.6 generates audio-driven video up to 15 seconds with video reference support. One of the few models that accepts both audio input and video reference for maximum creative control over long-duration clips.

Up to 15sAudio InputVideo ReferenceMultiple Aspect RatiosLong Duration

What is Wan 2.6?

Wan 2.6 is Alibaba's latest video generation model in the Wan series. It combines two input types that are rarely available together: audio input (which drives the video generation) and video reference input (which guides the visual style and motion character). Add up to 15 seconds of clip duration and you have a highly capable tool for long-form, reference-driven audio video production.

The audio-driven generation means the video responds to the audio you provide — rhythm, energy, texture. This is fundamentally different from models where audio is a toggle for AI-generated sound alongside video. In Wan 2.6, you bring the audio and the model produces video that matches it.

The video reference input adds visual guidance on top of the audio drive. Supply a reference clip to establish the visual language the output should follow, then supply audio to define its rhythm and character. Combined with a text prompt, this gives you three layers of creative direction over the 15-second output.

Audio-driven video

Audio input shapes the output

Video reference

Guide visual style and motion

Up to 15 seconds

Extended long-duration clips

Tri-input control

Audio + video ref + text

How to generate video with Wan 2.6 on project.video

Open the composer

Go to your project.video dashboard. Wan 2.6 is available under Alibaba models in the model selector.

Select Wan 2.6

Choose Wan 2.6. The composer will show you the audio input slot, video reference slot, and text prompt field.

Upload audio and/or video reference

Upload your audio file to drive the generation. Optionally add a video reference clip to guide the visual style and motion. Both inputs are optional but improve results.

Set duration and generate

Choose duration up to 15 seconds, aspect ratio, and write your prompt for visual direction. Generate and view your output in the gallery.

Technical specs

ProviderAlibaba

ModeAudio-driven video generation

Max duration15 seconds

Audio inputSupported (drives generation)

Video referenceSupported

Aspect ratiosMultiple

Best use cases

Music video content

Upload a track and a reference clip that matches the artistic direction you want. Wan 2.6 generates video driven by the music's energy and styled according to the reference — an audio-first music video workflow.

Long-form branded content with audio

At 15 seconds, Wan 2.6 can produce complete brand storytelling clips where the audio drives the energy of the piece. Supply brand audio (jingle, voiceover) and let the video respond.

Replicating a visual style from reference

Provide a video reference that captures the visual language you want (cinematography, color, motion) and audio that sets the rhythm. The model generates new content in that established style, set to that audio.

Audio-visual content for streaming

Podcast clips, music previews, and audio-led social content all benefit from an audio-first generation approach. Wan 2.6's 15-second duration covers most social audio clip lengths.

Example prompts

Pair these with audio and optional video reference uploads in the project.video composer.

"Abstract fluid color forms morph and pulse in response to the uploaded electronic music track, deep navy and electric blue palette, smooth organic motion, 16:9"

16:915sAudio + Video ref

"Brand film: a product travels from raw material to finished form, cinematic color grading matching reference video, paced to the uploaded audio track's rhythm, 16:9"

16:912sAudio + Video ref

"Landscape panoramic sequence driven by uploaded ambient music — clouds, water, and light move responsively to the audio's ebb and flow, golden hour palette, 16:9"

16:915sAudio-driven

Frequently asked questions

Start generating with Wan 2.6

Audio-driven video up to 15 seconds with video reference support. Access every Alibaba model on project.video.