OmniHuman 1.5

Realistic avatar video from a single photo and speech. OmniHuman 1.5 by ByteDance generates lifelike talking head video that lip-syncs to any speech audio — from any photo of a person.

Single Photo InputSpeech Audio InputTalking AvatarRealistic OutputByteDance Quality

What is OmniHuman 1.5?

OmniHuman 1.5 is ByteDance's avatar video generation model. It takes two inputs — a single photo of a person and a speech audio file — and generates a realistic talking head video in which the person in the photo appears to speak the audio you've provided.

The model produces highly lifelike results: natural facial expression, accurate lip movement, appropriate head motion, and realistic blinking and micro-expressions. The 1.5 generation represents a significant improvement in realism over the original OmniHuman, with more natural motion and better handling of diverse subjects.

Applications span personalized video messaging, spokesperson content, localization and dubbing, training video production, and any use case where you need a person to appear to deliver specific speech on camera without filming them. The single-photo requirement makes it highly accessible — no video of the person required.

Single photo input

No video of subject needed

Speech audio input

Any speech becomes the script

Talking avatar output

Realistic lip-synced video

Lifelike result

Natural expression and motion

How to generate avatar video with OmniHuman 1.5 on project.video

Open the composer

Go to your project.video dashboard. OmniHuman 1.5 is available under ByteDance models in the model selector.

Select OmniHuman 1.5

Choose OmniHuman 1.5. The composer will show the photo upload slot and speech audio upload slot.

Upload photo and speech

Upload a clear, front-facing photo of the person you want to animate. Upload the speech audio file you want them to appear to deliver.

Generate and review

Generate the avatar video. Review the lip sync, expression, and head motion in the output. You can adjust the photo or audio and regenerate to improve results.

Technical specs

ProviderByteDance

ModeAvatar video generation

Photo inputSingle photo of person

Audio inputSpeech audio file

OutputTalking head video with lip sync

Best use cases

Spokesperson and presenter video

Generate a brand spokesperson delivering a script — without a film crew, studio, or filming session. Provide a photo of the spokesperson and a voiceover recording.

Personalized video messages

Create personalized video messages at scale. Supply photos and individualized speech audio to generate video that speaks directly to each recipient.

Localization and dubbing

For existing content requiring localization, OmniHuman 1.5 can generate video of a person appearing to speak in any language, making multilingual versions of spokesperson content practical without reshooting.

Training and onboarding videos

Generate training video with a consistent on-screen presenter without ongoing filming commitments. Update the script at any time by changing the speech audio input.

Frequently asked questions

Start generating with OmniHuman 1.5

Realistic avatar video from a single photo and speech. Access OmniHuman 1.5 on project.video.