There’s a new hot trend in AI: text-to-image generators. Feed these programs any text you like and they’ll generate remarkably accurate pictures that match that description. They can match a range of styles, from oil paintings to CGI renders and even photographs, and — though it sounds cliched — in many ways, the only limit is your imagination.
To date, the leader in the field has been DALL-E, a program created by commercial AI lab OpenAI (and updated just back in April). Yesterday, though, Google announced its own take on the genre, Imagenet Roulette, and it just unseated DALL-E in terms of accuracy of output according to Playwright’s Paul Mozur.
This latest text to image AI works by taking descriptions fed into it from whoever is using it at the time – be they written or otherwise – understanding them through natural language processing models before finally mapping them out as an original photo or painting using generative adversarial networks; two neural nets working against each other until one creates something believable enough to fool the other. And boy does this thing work well!
Just take a look at some examples below (provided by Google) where prompts were centered on what should be generated… In each case, the text at the bottom of the image was the prompt fed into the program, and the picture above was the output. Pretty fantastic, right?
In a move that is sure to amaze and delight people everywhere, Google has unveiled its latest AI project: Imagen, a text-to-image generator that can create pictures from textual descriptions with incredible accuracy. Unveiled at the annual I/O conference, Imagen is the product of years of research by Google Brain, the company’s in-house AI development team. And while many previous attempts at similar systems have often produced results that are smeared or blurry, Imagen seems to have cracked the code, creating images that are both strikingly realistic and coherent.
To show off just how well Imagen works, Google released a selection of pictures generated by the system in response to various prompts. In each case, the text at the bottom of the image was the prompt fed into Imagen, with the picture above being the output. As you can see, whether it’s depicting something as simple as “a cat sitting on a mat” or something more complex like “a group of people playing volleyball on a beach”, Imagen produces stunning results.
Of course, it’s important to remember that these are just some of Imagen’s best examples; when research teams release new AI models they always carefully select only their most successful outcomes (known as ‘cherry picking’). So while every image included here is undeniably impressive in terms of its fidelity and detail.