OpenAI has delivered innovation again. This time, it is what can most easily be described as GPT-3 for images.
The company unveiled a neural network called DALL·E (that’s like a raised period in the middle of DALL and E, btw, hard to find on your keyboard). DALL·E is a combination of Wall-E as in Pixar’s animated robot character and Dali, the artist.
DALL·E can take in text using natural language processing and convert it into images. The picture above shows you the possibilities of the system – you type in “avocado chairs” and it spits out those images. Pretty cool. But how is it useful?
A brief overview of what DALL·E is
- According to the OpenAI blog, it’s a “12-billion parameter version of GPT-3trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.”
You can actually play around with the technology on the OpenAI blog at https://openai.com/blog/dall-e/.
DALL·E has a capability called “zero-shot reasoning.” This lets the system deliver images based upon text prompts without any additional training required – it’s the same kind of tech used for translation of foreign languages, for example.
Now it’s been applied it to the domain of images to perform both image-to-image and text-to-image “translation.” To illustrate this concept (haha), check out the images below from the company’s blog which the system created based on the text prompt “the exact same cat on the top as the sketch on the bottom.”
The company believes the neural network shows “that manipulating visual concepts through language is now within reach.”