Insights & Opinions

Goldrush in the uncanny valley

Computer generated art is nothing new. For as long as I have been working in design and technology I’ve always been attracted and inspired by generative art.

During school art classes learning about color and composition, and art by the likes of Mondriaan being the ultimate mastery of those, I wondered if these were things you could teach the computer. If color and composition follow certain ‘universal’ rules, it should be possible to create ‘universally’ appealing art.

It’s obviously not that easy. The artist imposes their will and critical thinking to the canvas and explores the edges to make something that is not only appealing but also looks for tension in that space, that typically is mostly only understood and interpreted by humans. And has been thought of as something that’s innate to humans and practically impossible to encode in a machine.

But recent history has plenty of examples where technology bedazzled people into believing that the art it created was in fact by humans and better than the competition - leaving people enraged after they found out the truth.

Two of my favorite examples - firstly from the 90s, Bach lovers were unable to tell between machine and Bach, and secondly, a recent art show where someone submitted a piece they made with Midjourney. The twist? They won.

But what changed - why is computer generated art much more important now, and why should you take it seriously?

Journeying beyond curiosity

Up until recently generative art and a lot of these explorations and imitations mostly have operated in the fringe, the bleeding edge, the science and arts. Machine learning has moved these experimentations forward and opened up new areas for exploration.

What followed was treated mostly as curiosity still, as the outputs of these technologies typically just generated uncanny results, best described as mandelbrot dreaming of recursive dogs. Whilst a fascinating prospect for those curious enough to try it out, the maturity wasn't there yet to make anything that would be considered meaningful for most people.

Around 2 years ago I started using VQGan and Clip, with varying levels of success. What’s clear to me is that the improvement of outputs between these tools to Dall-E and MidJourney is enormous - earlier versions gave us less realistic outputs, a more impressionist feel and style. The latest tools are giving us a much more realistic vision of the world, even more so with Stable Diffusion.

Fast forward a few years, and OpenAI releases CLIP in 2021. CLIP vastly improved the way algorithms understand how text relates to what we interpret as visually accurate. The art and science and open source communities picked it up, experimentation ensued, and realism of images generated increased significantly through the process of diffusion.

This was an awesome time where people explored painstakingly what it means to write prompts and how to tease images and styles out of the nebulous latent space. Then in 2022, cue the release of Dall-E andMid Journey, but more importantly Stable Diffusion’s public release of their data model.

“You won’t believe what happened next”

Exponential adoption

Disclaimer: the image above was released with the StableDiffusion 2.0 model and supporting press release, be mindful that this data point is likely picked to portray the PR in a certain light. But, if you made a similar graph showing the number of people on LinkedIn posting an image generated by one of these tools, I am pretty certain you’d get a very similar result.

This resonates strongly with what I’ve been experiencing and feeling about adoption (and ease of use). I believe this is not a fad that is about to pass. The outputs are really impressive and, more importantly, usable.

The complexity to pick it up and bring generative ai into projects and workflows has become trivial. It took just mere days after Stable Diffusion released for the open source community to have built an open source notebook to create animations with.

Examples in our own practice come to attention. We recently worked with our client Armada Music and a number of these AI-generative tools to co-create creative assets for the track ‘Computers take over the world’. We're using image generation to help fill slides with abstract mood/background images in our presentation decks. And the other day one of our Art Directors tapped into ChatGPT to assist with copywriting.

During the creative workshop we hosted with Armada music we explored multiple iterations and versions of prompts and images with references to the themes: Cyberpunk, Solarpunk, Progressive Experimentation, and Who's taking over the world?

I believe we’ve crossed the chasm (across the uncanny valley). The convergence of development interest, feasibility and accessibility that has been steadily making progress under the covers is right now coalescing into an explosion of a new generation of tools, creative workflows and techniques.

It’s about to become ubiquitous.

Do not underestimate

Currently generative AI (in particular looking ChatGPT) tends to be easily brushed aside as a knee-jerk reaction to the hype by pointing out its flaws; e.g it stinks at arithmetics (“haha lol how is this AI supposed to replace programmers?”). It’s detectable, the text it generates is very generic, what about those hands though?

To summarize, here’s why I think large scale adoption is not to be underestimated - it's the coming together of these:

Low threshold: Immediacy, accessibility + ease of use
This is the killer feature. There is very little technical knowledge needed to get up and running. Practically anyone with a computer and a keyboard can pick it up. It's intuitive and results are immediate, which forges new paths for creative process and interplay.

Developer adoption / tool maturity
Similarly, in large part thanks to (the team behind Stable Diffusion), for developers, machine learning toolchains are maturing and it’s become trivial to get going and flow it into existing workflows and processes. This in turn makes it more accessible and gives these tools and techniques even more exposure.

Feasibility of output
The tools were already available for some time, but as more of a curiosity. However the tools and datasets that have been released in 2022 just pushed it beyond quirky to utility.

Existential questions

Going back to the beginning, where we waded into this topic. It surfaces existential questions about what it is to be a creative human; what’s the role of the artists, who is providing the data, what about copyright, is everyone an art director?

It’s overwhelmingly a lot to cover, and these are important questions to dwell on. But as developers are tirelessly working away, it's important to not get stuck in the sand.

For now I would defer to Holly Herdon and Mat Dryhurst who’ve been creating exceptional art by exploring this space of tension for many years. And they actively work in finding solutions; most recently collaborating with Stable Diffusion enabling you to opt-out, and coming up with a licensing model for artists.

Over the last few years I’ve explored the automation of contemporary Haikus via a bot - inputting prompts, teaching it the ‘rules’ of a Haiku and ongoing testing to tackle that all important subtext. These images are the result of pairing the GPT haiku outputs with Stable Diffusion.

Change is coming

What I want to leave you with for now is this: this change is coming, fast (innovation too, image+text generation is merely scratching the surface). Especially in industries where content creation plays a big role will be impacted and likely need to adapt. I am counting down the days when a big piece of our bread and butter -- creating digital brand experiences -- (read: websites), will in majority be generated.

And along the way, I’m asking myself the question, what then, is our role of a digital creator in this landscape? What does a world look like that co-creates with their machine intelligence? I have not arrived at an answer yet, but I will continue to explore.