🍪 This Website uses cookies to improve your experience. Learn More



alignDRAW: The Pioneering Text-to-Image Model that Reshaped the AI Art Landscape

A New Era of Creative Expression

Art has always been a reflection of its time, capturing the zeitgeist of the era in which it was created. The alignDRAW collection, a set artworks from the first text-to-image model created in 2015(1), is a perfect example of this – a snapshot of a pivotal moment in the history of AI art, a testament to the capabilities of AI at that time, and a glimpse into the future of what it could achieve.

Each artwork in the collection is unique, generated from a text prompt and brought to life by a breakthrough in AI technology. The prompts range from the fantastical to the mundane, from “a stop sign is flying in blue skies” to “a very large commercial plane flying in rainy skies”. The resulting images, while not photorealistic, capture the essence of these prompts in a way that is both compelling and thought-provoking.

alignDRAW outputs from the prompts “a stop sign is flying in blue skies” and “a very large commercial plane flying in rainy skies“.

The alignDRAW collection represents a unique opportunity, a chance to own a piece of history, to be part of the narrative of AI’s role in art.

In this article, we invite you to join us on a journey of exploration and discovery, tracing the evolution of AI in art from the advent of neural networks to the development of text-to-image models. We examine the role of alignDRAW in this narrative, and consider the questions and possibilities it raises for the future of art and creativity.

The Dawn of AI in Art

Artificial Intelligence has been a subject of fascination and study for decades, but its application in the realm of art is a relatively recent phenomenon. The story begins with the advent of neural networks, a type of machine learning algorithm inspired by the human brain. Neural networks consist of interconnected layers of nodes, or “neurons,” that can learn to recognize patterns in data. The development of neural networks marked a significant leap forward in AI capabilities, opening up new possibilities for processing and generating complex data like images and text (2).

In the early days of AI art, neural networks were used primarily for tasks like style transfer, where the style of one image is applied to another(3). The next major milestone came with the development of Variational Autoencoders (VAEs) by Kingma and Welling in 2013(4). VAEs are a type of AI model that can learn to create new data that resembles the data it was trained on. For instance, if trained on a collection of images of cats, a VAE could generate new images that look like cats, even though they’re not exact copies of any specific cat image it was trained on. This ability to generate new, unique images was a significant advancement in the field of AI art, paving the way for more complex and creative applications.

A further breakthrough was achieved with the introduction of Generative Adversarial Networks (GANs) by Ian Goodfellow and his team in 2014(5). GANs are a class of machine learning systems that consist of two neural networks: a generator that creates new data instances, and a discriminator that evaluates them for authenticity. The generator and discriminator are trained together, with the generator trying to produce data that the discriminator can’t distinguish from the real thing, and the discriminator getting better and better at telling the difference. This adversarial process leads to the generation of artificial data that closely resembles the input data.

Visualization of samples from Goodfellow et al’s model. Rightmost column shows the nearest training example of the neighboring sample, in order to demonstrate that the model has not memorized the training set.

In the context of art, GANs opened up the possibility of creating new, unique images that mirrored the style and content of a given dataset. This was a significant leap forward, but the task of generating images from text descriptions remained a significant challenge.

The Emergence of Text Prompts in AI

The use of text prompts is a relatively recent development, but it has quickly become a cornerstone of modern AI art. It began with the development of recurrent neural networks (RNNs), a type of neural network designed to handle sequential data. RNNs are capable of processing sequences of varying lengths, making them ideal for tasks involving text or time-series data(6).

Text prompts offer a way to guide the AI’s creative process, providing a source of inspiration and direction. They allow for a level of control and intentionality that was previously unattainable, opening up new possibilities for human-machine collaboration in the field of creative expression.


The potential of RNNs for text generation was first demonstrated by Sutskever, Martens, and Hinton in 2011(7). In their work, they trained an RNN on a large corpus of text and then used it to generate new text, one character at a time. The resulting text was surprisingly coherent and grammatically correct, demonstrating the potential of RNNs for text generation.

However, while RNNs were capable of generating text, they were not designed to generate images from text descriptions. This was a significant challenge, as it required the model to understand the semantic content of the text and translate it into a visual representation.

A pivotal moment arrived with the development of the DRAW (Deep Recurrent Attentive Writer) model by Gregor et al. in 2015(8). DRAW is a recurrent neural network for image generation that introduced a unique attention mechanism. This mechanism allows the model to focus on different parts of an image as it’s being generated, similar to how a human artist might focus on different parts of a canvas while painting.

This was a departure from the existing models because it allows the AI to generate more detailed and coherent images, as it can “concentrate” on one area at a time instead of trying to generate the whole image at once. However, DRAW was not designed to generate images from text descriptions. Instead, it used a series of numbers, often referred to as a “latent vector” or “latent code”, to guide the image generation process.

alignDRAW: A Significant Moment in AI Art

Later in 2015, another leap forward was made with the advent of automated image captioning. Machine learning algorithms mastered the art of labeling objects within images and evolved further to formulate these labels into coherent, natural language descriptions.

This breakthrough spurred a team of researchers at the University of Toronto, led by 19-year old ‘wunderkind’ Elman Mansimov(9), to explore a different perspective: inverting the process from converting image to text, to turning text into image, a much more complex challenge. Their ambition wasn’t to simply pull existing images from databases, like a search engine would, but rather to generate completely unique scenarios that had no parallel in reality.

In the seminal paper, ‘Generating Images from Captions with Attention‘(10), Mansimov et al. extended the DRAW architecture by conditioning it on text sequences, allowing for the generation of images from natural language descriptions using a soft attention mechanism. The model was trained on a dataset of 82,783 images, each paired with a corresponding text description. It was then able to iteratively draw patches on a canvas while attending to relevant words in the description, with each image being a unique interpretation of the prompt.

They tested their computational model by requesting it to produce visuals of unheard scenarios, such as “a herd of elephants flying in the blue skies” or “a toilet seat sits open in the grass field.” While the output was limited to tiny 32 by 32 pixel images, it nonetheless revealed promising prospects for the future capabilities of this technology. The images generated are not a mirror image of reality, but they encapsulate the spirit of the prompts in a manner that is intriguing and invites contemplation.

alignDRAW outputs from the prompts “a herd of elephants flying in the blue skies” and “a toilet seat sits open in the grass field“.

This was a revolutionary development, as it marked the first time that a model was capable of generating coherent and contextually appropriate images from text descriptions.

The alignDRAW model marked the genesis of a whole new world of AI art generated by text-to-image models, leading the way for more recent tools such as DALL-E, Stable Diffusion and Midjourney. The excellent Vox video, ‘The text-to-image revolution, explained’, underscores the crucial role it played and also highlights the cultural and societal impacts of these developments.

The historic alignDRAW Collection

The alignDRAW collection offers a unique opportunity – an invitation to possess a piece of history, to own the ‘patient zero’ of text-to-image generated art.

The Grail Sets

In the official published paper, Generating Images from Captions with Attention, Mansimov included sets of 8 artworks generated by 21 different text prompts such as “a stop sign is flying in blue skies” and “a very large commercial plane flying in rainy skies“.

These 168 unique artworks will be minted individually but sold in sets of 8 to private collectors and institutions.

The 21 sets of 8 artworks from the official paper, Generating Images from Captions with Attention.

The Process Works

In addition to the published artworks, Mansimov also saved a further 2,057 artworks created by the alignDRAW model in 2015. These are the only outputs in existence and came from sets of 121 images generated from 17 different text prompts such as “a group of happy elephants in the dry grass field” and “a picture of a dark sky.”

These 2,057 unique artworks will be minted individually and auctioned to the public along with some donations to artists and the Web3 community.

Launching soon! Join on Discord to stay updated.

Art, AI, and the Question of Humanity

In concluding our journey to trace the evolution of AI in art, it’s important to reflect on some profound philosophical questions brought about by these technological developments. The emergence of AI art has ignited a philosophical debate about the nature of art and the role of the artist. Traditionally, art has been seen as a uniquely human endeavor, a reflection of the artist’s thoughts, emotions, and experiences. But with AI now capable of creating art, this traditional view is being challenged. If a machine can create art, what does this say about us as humans? Are we defined by our ability to create, or is there something more that makes us human? 

In this new era, AI is not just a tool, but a partner in the creative process, offering new ways of seeing and interpreting the world around us.

It is undeniable that AI has already had a significant impact on the art world, and that the development of text-to-image models, pioneered by alignDRAW, have opened up new possibilities for art and creativity. It marked a moment that demonstrated the potential of AI to not just mimic human creativity, but to contribute to it in meaningful ways. 


  1. Text-to-image model, Wikipedia
  2. Deep learning, LeCun et al., 2015
  3. Image Style Transfer Using Convolutional Neural Networks, Gatys et al., 2016
  4. Auto-Encoding Variational Bayes, Kingma et al., 2013
  5. Generative Adversarial Networks, Goodfellow et al., 2014
  6. Recurrent neural network based language model, Mikolov et al., 2010
  7. On the importance of initialization and momentum in deep learning, Sutskever et al., 2013
  8. DRAW: A Recurrent Neural Network For Image Generation, Gregor et al., 2015
  9. Computer, Draw an Open Toilet Sitting In a Grassy Field, Vice, 2015
  10. Generating Images from Captions with Attention, Mansimov et al., 2015

Related Posts