OmniHuman 1.5
Realistic avatar video from a single photo and speech. OmniHuman 1.5 by ByteDance generates lifelike talking head video that lip-syncs to any speech audio — from any photo of a person.
What is OmniHuman 1.5?
OmniHuman 1.5 is ByteDance's avatar video generation model. It takes two inputs — a single photo of a person and a speech audio file — and generates a realistic talking head video in which the person in the photo appears to speak the audio you've provided.
The model produces highly lifelike results: natural facial expression, accurate lip movement, appropriate head motion, and realistic blinking and micro-expressions. The 1.5 generation represents a significant improvement in realism over the original OmniHuman, with more natural motion and better handling of diverse subjects.
Applications span personalized video messaging, spokesperson content, localization and dubbing, training video production, and any use case where you need a person to appear to deliver specific speech on camera without filming them. The single-photo requirement makes it highly accessible — no video of the person required.
Single photo input
No video of subject needed
Speech audio input
Any speech becomes the script
Talking avatar output
Realistic lip-synced video
Lifelike result
Natural expression and motion
How to generate avatar video with OmniHuman 1.5 on project.video
Open the composer
Go to your project.video dashboard. OmniHuman 1.5 is available under ByteDance models in the model selector.
Select OmniHuman 1.5
Choose OmniHuman 1.5. The composer will show the photo upload slot and speech audio upload slot.
Upload photo and speech
Upload a clear, front-facing photo of the person you want to animate. Upload the speech audio file you want them to appear to deliver.
Generate and review
Generate the avatar video. Review the lip sync, expression, and head motion in the output. You can adjust the photo or audio and regenerate to improve results.
Technical specs
Best use cases
Spokesperson and presenter video
Generate a brand spokesperson delivering a script — without a film crew, studio, or filming session. Provide a photo of the spokesperson and a voiceover recording.
Personalized video messages
Create personalized video messages at scale. Supply photos and individualized speech audio to generate video that speaks directly to each recipient.
Localization and dubbing
For existing content requiring localization, OmniHuman 1.5 can generate video of a person appearing to speak in any language, making multilingual versions of spokesperson content practical without reshooting.
Training and onboarding videos
Generate training video with a consistent on-screen presenter without ongoing filming commitments. Update the script at any time by changing the speech audio input.