According to Stevenson, working with text-to-video is similar to working with text-to-image in many ways. You input a text prompt and then make adjustments to it multiple times. However, there is an additional challenge. When experimenting with different prompts, Sora generates low-resolution video. Once you find a prompt you like, you can enhance the resolution. However, transitioning from low to high resolution involves another round of generation, potentially leading to the loss of what you initially liked in the low-res version.
Stevenson mentions that sometimes the camera angle changes or the objects in the shot move. Hallucination remains a feature of Sora, as with any generative model. While still images may exhibit odd visual defects, video can show these defects over time, resulting in strange transitions between frames.
Stevenson also had to learn how to communicate effectively with Sora. He notes that the tool interprets prompts very literally. In one instance, he attempted to create a shot zooming in on a helicopter, but Sora combined a helicopter with a camera’s zoom lens in the generated clip. Despite this, Stevenson believes that with imaginative prompting, Sora is easier to control compared to previous models.
Despite the surprises, Stevenson finds the technology enjoyable to use. He appreciates the lack of control and the chaos it brings. While there are numerous video-making tools offering editing and visual effects control, Stevenson values a generative model like Sora for producing unusual and unexpected content from the start.
All the animal clips were generated using Sora. Stevenson tried various prompts until he achieved a result he liked. He describes his role as directing the tool, but more like nudging it, and then experimenting with different variations.
For instance, Stevenson envisioned his fox crow with four legs, but Sora depicted it with two, which he found even better. While the creature isn’t flawless (as sharp-eyed viewers may notice a transition from two legs to four legs and back in the video), Sora also produced versions that he deemed too unsettling to use.
Once Stevenson had a selection of animals he favored, he combined them and added captions and a voice-over. While he could have created his imaginary menagerie with existing tools, it would have taken much longer. The process was significantly faster with Sora.
Stevenson explains that he experimented with various characters to create something visually appealing. He has collected numerous clips featuring random creatures. The moment he saw the girafflamingo created by Sora, he started pondering about its narrative, diet, and habitat. He plans to release a series of longer films exploring each fantasy animal in greater detail.
Stevenson hopes that his fantastical animals will convey a broader message. He anticipates a surge of new content flooding feeds and believes that using clearly fictional stories is one way to educate people about what is real.
He emphasizes that his film may serve as an introduction to generative model-created videos for many viewers. He aims to make it abundantly clear from the start that the content is not real.