The design facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylised generation…reports Asian Lite News
Google has introduced a new video generation AI model called Lumiere that uses a new diffusion model called Space-Time-U-Net, or STUNet. Lumiere creates 5-second videos in one process instead of putting smaller still frames together.
This technology figures out where things are in a video (space) and how they simultaneously move and change (time).
“We introduce Lumiere — a text-to-video diffusion model designed for synthesising videos that portray realistic, diverse and coherent motion — a pivotal challenge in video synthesis,” said Google researchers in a paper.
“We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model,” they wrote.
The design facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylised generation.
Lumiere can perform text-to-video generation, convert still images into videos, generate videos in specific styles using a reference image, apply consistent video editing using text-based prompts and create cinemagraphs by animating specific regions of an image.
The Google researchers said that the AI model outputs five-second-long 1024×1024 pixel videos, which they describe as “low-resolution.”
Lumiere also generates 80 frames compared to 25 frames from Stable Video Diffusion.
“There is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use,” said the paper authors.