Responses by Chris Mullany, creative technologist, Universal Everything.
Background: This project came about when Simon Pyke, a sound designer and musical artist also known as Freefarm, and singer James Buttery, formerly a member of the electronic music group Darkstar, collaborated on a new album project. Unlike most of Simon’s previous work, these tracks take on a more song-based approach with lyrics by James. In response, we created music videos “driven” by the words. The mood and tone of the song are represented visually, and the lyrics themselves appear in a typographic treatment.
Design thinking: We wanted to build a generative system to create the music video visuals rather than animating by hand. The idea was to feed lyrics in and get video frames out. We wanted to be surprised by the outcomes, rather than each frame being predictable. Lyrics would drift in and out of legibility in time with the music, as though they were living, breathing entities and not just static written words.
A generative AI solution seemed like a perfect fit. We created a custom system that let us manually control the basic structure of how lyrics appeared and animated, with a generative AI layer to bring everything to life.
Additionally, we wanted to create a system that would let us respond creatively to the themes and feel of the music. In this instance, the song is about a cycle commute through the urban environment of London streets; using brutalist architecture as a cue felt like the perfect fit.
Challenges: Striking a balance between legibility and abstraction. We didn’t want lyrics to appear too literal or clinical; we wanted them to occupy a world that reflected the music. Pushed too far, the words would melt and morph away into abstract forms and become meaningless. We tried several approaches, each with many different configurations before arriving at a place we were happy with. Interestingly, when viewed independently, many frames of the animation are not particularly legible. But, when viewed as a moving image sequence, the words spring to life and become discernible.
Favorite details: What many people don’t realize about generative AI imagery is that it’s often heavily curated—people share “good” results and discard all the undesirable ones. Because we weren’t concerned with one single hero image but thousands of images per video, we had to ensure our system could produce variety while minimizing undesirable images. A lot of work went into tuning the typographic approach and the generative AI system and parameters to create videos full of surprises—even to their creators—but still retained cohesion, legibility and the desired aesthetic throughout.
New lessons: We tried several approaches to the generative AI element of the project and ended up learning a lot about how to work with this new, emerging technology alongside traditional animation techniques. A big part of this was the control factor and how to retain structure—i.e., typography in the final image. Examples of generated AI text-to-image are common, but examples where very specific structures and patterns are retained are less common.
Time constraints: Because these were music videos for self-released tracks, we had to be mindful of how much time was spent in creating them. Generative systems, once up and running, are a lot of fun to play with, and there’s always a danger that one can get lost in the process of creating and tweaking endless variations. We set constraints around this and also decided that each video should follow a simple animation formula that could be created procedurally rather than by hand. I think the latter resulted in each video having a particular rhythm that might not have emerged if they’d been hand-animated.