Is it art or artificial intelligence?

A picture may be worth a thousand words, but thanks to an artificial intelligence program called DALL-E 2, you can have a professional-looking image with much less.

DALL-E 2 is a new neural network algorithm that creates an image of a short phrase or phrase you specify. The program, announced in April 2022 by the artificial intelligence research lab OpenAI, has not been released to the public. But a small and growing number of people – myself included – have been given access to experiment with it.

As a researcher studying the connection between technology and art, I was eager to see how well the program worked. After hours of experimentation, it is clear that DALL-E, while not without its shortcomings, is well ahead of existing image generation technology. It immediately raises questions about how these technologies will change the way art is made and consumed. It also raises questions about what it means to be creative when DALL-E 2 seems to automate so much of the creative process itself.

OpenAI researchers built DALL-E 2 from a huge collection of captioned images. They collected some of the images online and licensed others.

Using DALL-E 2 is a lot like searching for an image on the Internet: you type a short sentence in a text box and you get six images back.

But instead of being snatched from the web, the program creates six brand new images, each reflecting a version of the entered sentence. (Until recently, the program produced 10 images per prompt.) For example, when some friends and I gave DALL-E 2 the “cats in devo hats” text prompt, it produced 10 images in different styles.

Almost all of them could pass for professional photos or drawings. While the “Devo Hat” algorithm didn’t quite understand — the strange helmets worn by the New Wave band Devo — in the images it produced, the headgear came close.

In recent years, a small community of artists have used neural network algorithms to produce art. Many of these works of art have distinctive features that look almost like real images, but with strange distortions of space – a kind of cyberpunk cubism. The most recent text-to-image systems often produce dreamy, fantastic images that can be delightful, but rarely look real.

DALL-E 2 offers a significant leap forward in the quality and realism of the images. It can also mimic specific styles with remarkable accuracy. If you want images that look like real photos, you’ll get six lifelike images. If you want prehistoric cave paintings of Shrek, it will generate six images of Shrek as if drawn by a prehistoric artist.

It’s amazing that an algorithm can do this. Each set of images takes less than a minute to generate. Not all images are pleasing to the eye, nor do they necessarily reflect what you had in mind. But even with the need to search through many outputs or try different text prompts, there is no other existing way to get so many great results so quickly – not even by hiring an artist. And sometimes the unexpected results are the best.

In principle, anyone with sufficient resources and expertise can make such a system. Google Research recently announced an impressive, comparable text-to-image system, and a startup, HuggingFace, is publicly developing its own version that anyone can try on the web right now, although it’s not quite as good as DALL-E or the system yet. from Google .

It’s easy to imagine these tools changing the way people create and interact with images, be it memes, greeting cards, ads — and, yes, art.

Where is the art in that?

I had a moment early on when I was using DALL-E to create different kinds of paintings, in all different styles – like “Odilon Redon painting of Seattle” – when it dawned on me that this was better than any painting algorithm I’ve ever seen. have developed. Then I realized that in a way it is a better painter than I am.

In fact, no one can do what DALL-E does: create such a high-quality, varied range of images in just seconds. Of course, if someone told you that a person created all these images, you would say they were creative.

But that doesn’t make DALL-E an artist. Even though it feels like magic at times, under the hood it’s still a computer algorithm that strictly follows the instructions of the algorithm’s authors at OpenAI.

If these images succeed as art, they are products of how the algorithm is designed, the images it is trained on, and most importantly, how artists use it.

You might be tempted to say that there is little artistic value in an image produced with a few keystrokes. But in my opinion, this way of thinking reflects the classical view that photography cannot be art because a machine did all the work. Today, the human authorship and craft involved in artistic photography is recognized, and critics understand that the best photography involves much more than just pressing a button.

Yet we often discuss works of art as if they stem directly from the artist’s intent. The artist wanted to show something, or express an emotion, so they made this image. DALL-E seems to completely abbreviate this process: you have an idea and type it in, and you’re done.

But when I paint the old-fashioned way, I’ve found that my paintings come from the exploratory process, not just from carrying out my original goals. And that goes for many artists.

Take Paul McCartney, who came up with the song “Get Back” during a jam session. He didn’t start with a plan for the song; he just started tinkering and experimenting and the band has developed it further from there.

Picasso described his process in the same way: “I don’t know in advance what I’m going to put on the canvas, any more than I decide in advance what colors I’m going to use… Every time I make a painting, I have a sense of jump to space.”

In my own explorations with DALL-E, one idea led to another, which led to another, and eventually I would find myself in a totally unexpected, magical new territory, very far from where I had started.

I’d say that the art of using a system like DALL-E comes not just from the final text prompt, but from the entire creative process that led to that prompt. Different artists will follow different processes and end up with different results that reflect their own approaches, skills and obsessions.

I started thinking of my experiments as a series of sequences, each a consistent dive into a single theme, rather than a series of independent crazy images.

Ideas for these images and series came from all over, often connected by a series of stepping stones. At some point, while creating images based on the work of contemporary artists, I wanted to generate an image of site-specific installation art in the style of contemporary Japanese artist Yayoi Kusama. After trying a few unsatisfactory locations, I came up with the idea of ​​putting it in La Mezquita, a former mosque and church in Córdoba, Spain. I sent the photo to an architect colleague, Manuel Ladron de Guevara, who is from Córdoba, and we started working out other architectural ideas together.

This became a series about imaginary new construction in different architectural styles.

That’s why I started thinking about what I do with DALL-E as both an exploration and art form, even if it’s often amateur art like the drawings I make on my iPad.

Some artists, such as Ryan Murdoch, have even advocated for prompt-based imagery to be recognized as art. He mentions the experienced AI artist Helena Sarin as an example.

“If I look at most things from Midjourney” — another popular text-to-image system — “a lot of it will be interesting or fun,” Murdoch told me in an interview. “But with [Sarin’s] work, there is a through line. It’s easy to see that she’s put a lot of thought into it and worked on the craft, as the output is more visually appealing, more interesting, and follows her style all the time.”

Working with DALL-E, or any of the new text-to-image systems, means learning the quirks and developing strategies to avoid common pitfalls. It’s also important to understand the potential harm, such as relying on stereotypes and the potential use of misinformation. With DALL-E 2 you also discover surprising connections, such as the way everything becomes old-fashioned when you use the style of an old painter, film-maker or photographer.

When I want to make something very specific, DALL-E often can’t. The results would require a lot of difficult manual editing afterwards. When my goals are vague, the process is most delightful, with surprises leading to new ideas leading in turn to more ideas, and so on.

New realities

These text-to-image systems can also help users create new possibilities.

Artist-activist Danielle Baskin told me she’s always busy “showing alternate realities through a ‘real’ example: either by setting up scenarios in the physical world or by doing painstaking work in Photoshop.” However, DALL-E is “a great shortcut because it’s so good at realism. And that’s the key to helping others bring possible futures to life – be it satire, dreams or beauty.”

She used it to introduce herself an alternative transport system and plumbing that transports noodles instead of waterboth of which reflect her sensibility as an artist and provocateur.

Likewise artist Mario Klingemann’s architectural renderings with the tents of the homeless can be taken as a response to my architectural renderings of beautiful dream houses

It is too early to judge the significance of this art form. I keep thinking of a phrase from the excellent book “Art in the After-Culture”: “The dominant AI aesthetic is novelty.”

This would certainly be true to some extent for any new technology used for art. The Lumière brothers’ first films in the 1890s were novelties, not cinematic masterpieces; it amazed people to see images moving.

AI art software is developing so fast that there are constant technical and artistic novelties. It seems like every year there is an opportunity to explore an exciting new technology – one more powerful than the next, and each seems poised to transform art and society.

Aaron Hertzmann is an associate professor of computer science at the University of Washington

Leave a Comment