Luma Labs, the AI company that previously introduced the generative 3D model Genie, has entered the world of video AI with the Dream Machine—and it’s impressive.
Demand to try the Dream Machine overloaded Luma’s servers so much that they had to implement a queuing system. I’ve been waiting all night for my challenges to become videos, but the actual “dreaming” process takes about two minutes once you get to the front of the queue.
Some of the videos shared on social media by people who were granted early access seemed too impressive to be true, cherry-picked in a way that you can show what the existing AI video models do best – but I gave it a try and It is alright like this .
While it doesn’t seem to be Sora level, or even as good as Kling, what I’ve seen is one of the best AI video models yet for fast tracking and movement understanding. It’s significantly better than Sora in one way — everyone can use it today.
Each video generation is about five seconds long, almost twice as long as those from Runway or Pika Labs without extensions, and there is evidence of some videos with more than one take.
What is it like to use the Dream Machine?
I made several clips during testing. One was done in about three hours, the rest took most of the night. Some of them have questionable blending or blur, but for the majority of the party they capture motion better than any model I’ve tried.
I had them demonstrate walking, dancing and even running. Older models may have people walking backwards or have a zoom doll on a dancer standing still from challenges requiring this type of movement. No Dream Machine.
Dream Machine captured the concept of an object in motion brilliantly without having to specify the area of motion. It was especially good on the run. However, you have minimal fine-tuned or granular control beyond the challenge.
This may be because it’s a new model, but everything is handled by challenge – which the AI automatically improves using its own language model.
Ideogram and Leonardo also use this technique when generating images and it helps to offer a more descriptive explanation of what you want to see.
It could also be a feature of video models built on transformer diffusion technology rather than direct diffusion. British AI startup Haiper also says its model works best when you let the prompt do the work, and Sora is said to be little more than a simple text prompt with minimal additional controls.
Testing the Dream Machine
I came up with a series of stimuli to test the Dream Machine. I also tested some of them against existing video AI models to see how they compare, and none of them reached the level of motion accuracy or realistic physics.
In some cases, I gave it a simple text prompt that enabled the enhancement feature. For others, I induced it myself with a longer prompt, and in a few cases I gave it an image I generated in Midjourney.
1. Running for ice cream
For this video, I created a longer form and description prompt. I wanted to create something that looked like it was shot on a smartphone.
Prompt: “An excited child runs towards an ice cream truck parked on a sunny street. The camera is just behind him, capturing the back of the child’s head and shoulders as their arms swing in excitement as the brightly colored ice cream truck approaches.” The video has a slight jump to mimic the natural motion of running while holding the phone.”
She made two videos. The first one looked like the ice cream truck was about to run over the kid, and the kid’s arm movements were a little weird.
The second video was much better. It definitely wasn’t realistic and had impressive motion blur. The video above was from the second take as it also captured the idea of a slight bounce in the camera movement.
2. Enter the dinosaur
This time I gave Dream Machine a simple challenge and told him not to improve the challenge but take what is given. In fact, two videos were created that flow into each other as if it were the first and second shots in the scene.
Challenge: “A man discovers a magical camera that brings any photograph to life, but chaos ensues when he accidentally takes a photo of a dinosaur.”
While there’s a bit of distortion, especially around the edges, the movement of the dinosaur charging into the room appreciates real-world physics in an interesting way.
3. Telephone on the street
A difficult challenge follows again. Specifically, one where Dream Machine has to take into account light, shaky motion, and a fairly complex scene.
Prompt: “A man walking down a busy city street at dusk, holding his smartphone vertically. The camera captures his hand swinging slightly as he walks, showing views of shop windows, people passing by, and the glow of street lamps. The video has a slight shake in his hand, to mimic the natural movement of holding the phone.”
It could have gone two ways. The AI could capture the view from a camera in a person’s hand, or capture a person walking and holding the camera—first versus third person. He opted for a third-person perspective.
It wasn’t perfect with some warping around the edges, but it was better than I expected given the inconsistency elements in my challenge.
4. Dancing in the dark
Next I started with a silhouette dancer image generated in Midjourney. I’ve tried using it with Runway, Pika Labs and Stable Video Diffusion and in each case it shows movement into the frame but not character movement.
Challenge: “Create a compelling tracking shot of a woman dancing in silhouette against a contrasting, well-lit background. The camera should follow the dancer’s fluid movements and focus on her silhouette throughout the shot.”
It wasn’t perfect. There’s a weird twist to the leg as it spins and the arms seem to fuse with the fabric, but at least the figure moves. That’s a constant in the Luma Dream Machine – it’s much better on the move.
5. Cats on the Moon
One of the first challenges I try with any new image or video AI generative mode is “cats dancing on the moon in spacesuits”. It’s weird enough that there aren’t any videos to draw from, and complex enough that the video struggles with the movement.
My exact prompt for the Luma Dream Machine: “A cat in a spacesuit on the moon dancing with a dog.” That was it, no refinement and no description of the type of movement – I left that to the AI.
This challenge showed that you need to give the AI some instructions on how to interpret the movement. It didn’t do a bad job, better than the alternative models currently available – but far from perfect.
6. Market visit
Next up was another that started with a Midjourney image. It was a picture showing a busy European food market. Midjourney’s original challenge was: “An ultra-realistic candid smartphone photo of a bustling open-air farmers market in a picturesque European square.”
For the Luma Labs Dream Machine, I simply added the instruction: “Walk through the busy and busy food market.” No additional movement command or character instruction.
I wish I was more specific about how the characters should move. It captured the camera movement really well, but it led to a lot of distortion and blending between the people in the scene. It was one of my first attempts, so I didn’t try better techniques for prompting the model.
7. Termination of the chess match
Finally, I decided to throw the Luma Dream Machine a complete curveball. I’ve been experimenting with another new AI model – Leonardo Phoenix – which promises impressive levels of fast-following. So I created a comprehensive AI image challenge.
Phoenix did a good job, but it was just an image, so I decided to put the exact same challenge into Dream Machine: “A surreal, weathered chessboard floating in a misty void, decorated with brass gears and cogs, where intricate steampunk chess pieces – including robotic ones steam-powered foot soldiers.”
It pretty much ignored everything but the board and created this surreal video of the chess pieces being swept off the end of the board as if they were dissolving. Because of the element of surrealism, I can’t tell if this was intentional or a failure of his understanding of movement. It looks cool though.
Final thoughts
I just did the following calculation: I had access to the Luma Dream Machine on Saturday night and in 2-3 days of playing with it I created 633 generations. Of those 633, I think at least 150 were just random tests for fun. So I guess it took me about 500… https://t.co/TpMCdDmlxyJune 12, 2024
Luma Labs Dream Machine is an impressive next step in generative video AI. It’s likely that they used experience with generative 3D modeling to improve the understanding of motion in video – but it still seems like a gap for true video AI.
Over the past two years, AI image generation has gone from weird, low-resolution depictions of people with many fingers and faces that look more like something Edvard Munch might paint than a photograph, to being almost indistinguishable from reality.
AI video is much more complex. Not only does it need to replicate the realism of photography, but it needs to understand the physics of the real world and how it affects movement – across scenes, people, animals, vehicles and objects.
Luma Labs has created one of the most realistic motion tools I’ve ever seen, but it still falls short of what’s needed. I don’t think it’s Sora level, but I can’t compare it to the videos I’ve made with Sora myself – only what I’ve seen from filmmakers and OpenAI themselves, and that’s probably cherry-picked from hundreds of failures.
Abel Art, an avid AI artist who had early access to the Dream Machine, created an impressive piece of work. But he said he needed to create hundreds of generations for just one minute of video to make it coherent and once you cut out the unusable clips.
Its ratio is roughly 500 clips per 1 minute of video, with each clip lasting about 5 seconds discarding 98% of the footage to create the perfect scene.
I suspect the ratio for Pika Labs and Runway is higher, and reports suggest that Sora has a similar discard rate, at least from filmmakers who have used it.
At this point, I think even the best AI video tools are meant to be used alongside traditional filmmaking, rather than replacing it—but we’re approaching what Ashton Kutcher predicts is an era where everyone can make their own feature films.