The greatest artistic instrument ever built, or a harbinger of doom for entire creative industries? OpenAI’s second-generation DALL-E 2 system is slowly opening up to the public, and its text-based image generation and editing capabilities are awe-inspiring.
The pace of progress in AI-powered text-to-image generation is positively alarming. The generative adversarial network, or GAN, first emerged in 2014 and put forward the idea of two AIs competing with each other, both “trained” by seeing a large number of real images labeled to help the algorithms learn what they are doing. are looking at. A “generator” AI then starts creating images, and a “discriminator” AI tries to guess whether they are real images or AI creations.
They are evenly matched at first, both are absolutely terrible at their jobs. But they learn; the generator is rewarded if it fools the discriminator, and the discriminator is rewarded if it correctly picks the origin of an image. Over millions and billions of iterations — each lasting a matter of seconds — improve them to the point where people begin to struggle to tell the difference.
They learn in their own way, completely unfocused by their programmers; each AI develops its own understanding of what a horse is, completely separate from the reality we understand. All it knows or cares about is its job: to fool or not fool the other AI, based on its own individual and completely mysterious methods of analyzing and creating data.
This leads to the famously strange disconnect from reality that has hitherto been the hallmark of such systems. Think of Deepdream’s strange obsession with dogs and eyes, or the unbridled and beautiful surrealism of systems like Botto, the AI/human NFT art collaboration.
Until now, these algorithms have been fascinating entertainment. DALL-E 2, on the other hand, makes crystal clear how disruptive this technology will be – not in five or ten years in the future, but as the doors open to the public. Just watch the video below and imagine how much time and money you would have to spend to make this happen using non-artificial intelligence.
DALL-E 2 represents a step change in AI technology for image generation. It understands natural language prompts far better than anything before, providing an unprecedented level of control over subjects, styles, techniques, angles, backgrounds, locations, actions, attributes and concepts – generating images of extraordinary quality. For example, if you tell him you want photorealism, he’ll be happy to let you decide the lens and aperture choices.
If you get a high-quality prompt, it will generate dozens of options for you within seconds, each at a level of quality that would take a human photographer, painter, digital artist, or illustrator hours. It’s an art director’s dream; a mixed bag of visual ideas in an instant, without having to pay creative, models or location costs.
You can also generate different versions – either versions of something DALL-E generated for you, or of something you uploaded. It creates its own understanding of the subject matter, composition, style, color palette and conceptual meaning of the image, generating a series of original pieces that reflect the look, feel and content of the original, but each adds its own turn around.
And DALL-E 2 can now edit too, in a way that makes Adobe’s insanely powerful yet notoriously unapproachable Photoshop software feel like a relic of the past. No education level is required. You can paint a stain in a chair and say “put a cat there.” You can tell DALL-E to “take care of sunset”, “place her in a neon-lit cyberpunk atrium”, or “remove the bike”. It understands things like reflections and will update them accordingly.
You can paste an image into it and ask the AI to expand it outwards to a wider image frame. Each time it gives you a few different options, and if you don’t like them you can just run the same instruction again or get more specific in your prompt. In fact, you can keep zooming out on an image indefinitely, and people are already using this for extraordinary creative effect.
These capabilities – which are just the surface of what it can do – make DALL-E 2 an absolutely revolutionary image editor. It feels like this technology can do just about anything.
Well, within limits. OpenAI designed DALL-E 2 to refuse to take images of celebrities or public figures. It also doesn’t accept uploads of “real-faced” images, and it does its best not to generate images of real people, instead modifying things in an interesting way that looks somewhat like the actual person, but also very clearly not. Mind you, given the sophistication of deepfake and image editing software, we don’t think it will take much effort to create a DALL-E image and paste the headline of your choice on it.
The system does not generate porn, or gore, or political content – and indeed, the data used to train it excludes these types of images. And unless you specify racial or demographic information in your prompts, “the system will generate images of people that more accurately reflect the diversity of the world’s population,” in hopes of avoiding some of the racial bias that AI systems often suffer from as a result of up to skewed training data.
DALL-E 2 is currently in beta, with a waiting list for those interested. One million accounts will be welcomed in the coming weeks, each with 50 free credits to use the system and another 15 credits per month. Additional credits cost $15 per 115 credits – and each credit gets you four images for a prompt or instruction. It is at once an incredible democratization of visual creativity and a knife to the heart of anyone who has spent years or decades refining their artistic techniques in hopes of making money.
OpenAI explicitly says that users are given “full rights to commercialize the images they create with DALL-E, including the right to reprint, resell and resell.” But there are still some fascinating legal gray areas that have yet to be fully explored here, as everything these AIs know about art they learned by analyzing the work of other human creators.
If this latest piece of software looks great, it’s worth remembering that it’s still a very early version of this kind of technology. DALL-E 2, his contemporaries and his descendants will continue their evolution at a breakneck pace that is likely to only accelerate.
Where to from here? Well, why no video? As processing power and storage continue to grow, it’s easy to imagine that systems like these should eventually be able to generate moving images as well. Adobe’s already built-in AI-enhanced video editing capabilities in its professional After Effects software, but we’re yet to see some DALL-E style creativity in video.
How long will it be before we see a very short film written, directed, soundtracked and made entirely by AI systems? And then, after that point, how long does it take for them to become worth watching?
What about other forms of graphic design? Can DALL-E make logos? Website Templates? Business cards? Will it evolve to automatically generate catalogs, posters, brochures, book covers and everything else a designer currently makes a living from? Probably. Indeed, if you’re young and interested in art or design, you’d probably best become an expert at making the most of these emerging tools because in a few years, whether you like it or not, you could. be what the performance seems.
Presumably, alternative AI image generators will soon appear without the ethical and moral boundaries that OpenAI has drawn around DALL-E. Cans with worms are opened. Noses are removed from the joint. DALL-E offers a glimpse of a future that is fundamentally different, and this kind of upheaval is never painless.
Watch a short video below.
DALL E 2 explained