Runway’s latest AI video generator brings giant cotton candy monsters to life

Magnify / A screenshot of the video Runway Gen-3 Alpha generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomps on the ground and roars to the sky, the clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha, which is still in development but appears to produce video of a similar quality to OpenAI’s Sora, which debuted earlier this year (and has yet to be released, either). It can generate new high-definition video from text prompts that range from realistic humans to surreal monsters trampling the countryside.

Unlike the previous June 2023 top model Runway, which could only create two-second clips, the Gen-3 Alpha is said to be able to create 10-second video segments of people, places and things with a consistency and coherence that easily surpasses it. Gen-2. If 10 seconds seems short compared to a full minute of Sora’s video, consider that the company is operating on a limited computing budget compared to the more heavily funded OpenAI — and in fact has a history of delivering video generation capabilities to commercial users.

Gen-3 Alpha does not produce audio to accompany video clips, and it is highly likely that time-coherent generations (those that keep a character consistent over time) depend on similar high-quality training material. But the improvement in Runway’s visual fidelity over the past year is hard to ignore.

AI video is heating up

AI video synthesis has had a busy few weeks in the AI ​​research community, including the launch of China’s Kling model created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). The Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherence that is said to match the Sora.

Gen-3 Alpha challenge: “Subtle reflections of a woman on the window of a train moving at high speed in a Japanese city.”

Not long after Kling debuted, people on social media started creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were new and strange, but generally lacked cohesion; we tested the Dream Machine and nothing we saw impressed us.

Meanwhile, one of the original text-to-video pioneers, New York-based Runway, founded in 2018, recently found itself in the crosshairs of memes that showed its Gen-2 technology falling out of favor compared to newer video synthesis models. This may have prompted the announcement of the Gen-3 Alpha.

Gen-3 Alpha challenge: “An astronaut running down an alley in Rio de Janeiro.”

Generating realistic humans has always been difficult for video synthesis models, so Runway specifically demonstrates Gen-3 Alpha’s ability to create what its developers call “expressive” human characters with a range of actions, gestures and emotions. However, the examples provided by the company weren’t particularly expressive – mostly people just slowly stared and blinked – but they look realistic.

Human examples provided include generated videos of a woman on a train, an astronaut running down a street, a man with his face illuminated by the glow of a television, a woman driving a car, and a woman running, among others.

Gen-3 Alpha Challenge: “Close up shot of young woman driving car, looking pensive, blurred green forest visible through rainy car window.”

The generated demo videos also contain more surreal examples of video synthesis, including a giant creature walking in a dilapidated city, a rock man walking in a forest, and the giant cotton candy monster you see below, which is probably the best video ever. page.

Gen-3 Alpha Challenge: “A giant humanoid, made of fluffy blue cotton candy, stomps on the ground and roars to the sky, the clear blue sky behind them.”

Gen-3 will power various Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls and Director Mode. It can create videos from text or image prompts.

Runway says the Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward developing what it calls “General World Models,” which are hypothetical artificial intelligence systems. create internal representations of environments and use them to simulate future events in those environments.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top