Inflection Point for ML and Art

Generating art from text prompts is at an inflection point. A new creative field is developing in front of our eyes.

  • A community built on Colab has nailed obviously useful generation of still images, 2D and 3D animations, and processing videos.

  • Tools like Midjourney are making it easy to use

  • Big companies are demonstrating out of this world results using larger models.

  • The FOSS community is creating datasets and training models to match the big companies. [LAION.ai]

  • It is mainstream. [A$AP Ant & A$AP Rocky music video]

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
[Facebook Meta AI Research, arXiv]

 

The core technical insight is using two ML models.

One paints, the other scores against the prompt.

Jack Morris (@jxmnop) a Cornell PhD student studying natural language processing, wrote an excellent article explaining the techical background fully, The Weird and Wonderful World of AI Art.

Creating

Artists have embraced these tools.
They aren’t just entering a text prompt: they’re playing with all the parameters, trying multiple variants at the same time.
Creative work is done by exploring, and creative work is special when its distinctive.
Significantly, this lowers the burden on the technical side: it is neither necessary nor desirable to get picture perfect results on the first try.

These are the two most popular tools currently, in April 2022. Discoveries are being made at a rapid clip.
The best way I’ve found to keep up on the field as a whole is Reddit, /r/mediasynthesis and /r/discodiffusion.
Zippy’s Disco Diffusion Cheatsheet is an excellent manual, not only for Disco Diffusion itself, but the community and tooling.

Disco Diffusion

The latest and greatest Colab notebook is Disco Diffusion.
Colab is free to try. You can subscribe to get more features, most importantly, more powerful GPUs.

It can be found on Github.
/r/discodiffusion and a Discord welcome you.
Zippy’s Disco Diffusion Cheatsheet is an excellent manual.

Midjourney

Midjourney is a tool in private beta. [Twitter, link to apply in bio]
Over the week it took me to write this, Midjourney became very well-known, and it’s unclear if there are any beta spots left.

Slideshow Gallery

enjoy the slideshow, click thumbnail to jump
want to view in detail? download? scroll to bottom, Full-sized Gallery

Video

Disco Diffusion can create still images, or 2D animations, or 3D animations, or take a video as input and repaint each frame.

Here, we take a black and white video of Monet painting in his garden, and repaint each frame in the style of Monet.

Creating this leveraged 3 different models: a video colorizer, a painting model, and an upscaler.

Below, you can see the output of each step. Left to right:
- B&W video of Monet painting in his garden
- Colorized
- Paint Giverny garden like Monet
- Paint Giverny garden like Monet, in the winter, with an ice blue & white color scheme

Full-sized Gallery

click to view full screen, right click to download

* this isn’t all of the pictures in the slide shadow gallery, skipped some only because it’d be a couple more hours of work to hunt down the full-sized version of each upload

** @jpohhhh on Twitter; for E-mail, same user name at gmail.com

Previous
Previous

Gigadiffusion: Digital Art at Reality Scale