Using Data Science to: Generate Art.
Long Philosophical Introduction
AI is rapidly improving. At some point, we will have to start asking, and attempting to answer, hard questions about the philosophical implications of these improvements. One such question has to do with the nature of “creativity”.
AI is primarily driven by algorithms and heuristics. They are complex, but generally consistent and bound by mathematical operations. Given this fact, will it ever be possible for an AI to exhibit creativity? Does creativity require a degree of unpredictability and “chaos”, or can it still arise from a causal chain of events?
Its easy (and fun!) to form opinions on these matters, but part of what makes this such a difficult question to answer well, is the fact that we still understand so little about human thought. It is very possible that our decisions and actions are just as causal as an AI’s, and that the “novelty” is just a byproduct of complexity we don’t understand. If that were the case, would we even exhibit creativity, and more importantly, what would that imply about free-will, ethics, and morality?
To avoid going down a rabbit hole, it’s sometimes easier to focus on the application of some of these concepts, rather than the concepts themselves. So let’s do that instead, by talking about art.
It’s commonly accepted that “art” requires at least some degree of creativity or novelty. If you disagree, think about your own feelings toward an artist copying another’s work. Do you place a higher value on the artist who created the piece, or on the one who executed the piece with higher technical skill? For most of us, we care about the “creator” and see the generation of the ideas as the “hard part”, and the execution of the idea as necessary, but not sufficient on its own.
Given these feelings and the causal nature of AI, the natural question that arises is: Is it possible for an AI to generate “art”, and if so, does that mean that our AI is an artist? So that is exactly what I set out to explore this week.
What’s a GAN?
Before we can get into the question, we need to take a detour to talk about “GANs”.
If you have ever heard of “deep-fakes” you are aware that there are already some algorithms out there for generating images and video. What you may not know, is that theres a whole host of these algorithms designed to generate media, and they are all built on “GANs”.
“GAN” stands for generative adversarial network. It consists of two separate neural networks, one of which is the generator, and the other is the discriminator.
The generator’s job is to take in “noise”, which is literally just randomly generated values, and create an image from it. (This can also be done with audio, video, etc.).
The discriminator’s job is to look at the images and determine the probability that they are showing what they should be showing. If we assume we are looking for cats in the images, some examples may look like this.
When we train the network, we use the generator to create images, and then feed those images, along with real images to the discriminator. It evaluates all the images, sees how close it was to being correct for each, and then updates itself accordingly to improve.
Once the discriminator has been updated, we give the generator a chance to train by trying to trick the discriminator with its generated images. The new image is passed to the discriminator, where it scores the probability, calculates the loss, and then uses this to update the weights of the generator.
This process is repeated thousands of times with both the generator and discriminator continuing to improve. As a result, the generated images look closer and close to the real “class” they are trying to simulate, in some cases, looking real enough to even fool humans.
The Approach
As discussed above, I wanted to explore what would happen if I trained a GAN to generate art. While generated faces are easy to “evaluate” for accuracy, art is much more subjective. My hypothesis was that the GAN would generate something that looked “art-like”, but that it would stop short of developing a style or theme across the images it generated.
Because I wanted to keep this as open ended as possible, I decided to train the GAN on abstract art. As always the first step was to track down a dataset. Luckily, I didn’t have to build one this time, as I was able to find this dataset of 2,782 abstract art images on Kaggle.
With the images in hand, I had to decide on which type of GAN to implement. To be clear, there are many to choose from, but because of the way a GAN is trained, I couldn’t use my favorite approach of transfer learning, instead I was going to have to train this one from scratch. Ultimately, I landed on using StyleGAN because it was easy to implement and well documented.
So with my hypothesis, the data, and the model, I was ready to begin training.
Training the Model
As I started playing with the model, I realized that training was going to take a very, very long time. I initially planned to scale my input images to 1 megapixel, (1024px x 1024px). But after a few hours of running, it was projecting a total training time of 4 days, 16 hours, and 24 minutes. Because I was training on a public GPU, I had to get that number down to about 8 hours or less to make sure it would be able to finish. So I resized the images down to 256px x 256px (0.0625 megapixels) which put the estimated training time just over 7 hours.
Because of this, the generated images I show below are too low resolution to display full size. So I combined 120 samples from each phase of training into one image, so we could get a snapshot across many examples at once, without feeling like we’re just starting at pixels.
The training was set to go through 30 cycles, generating the 120 sample images after each one. It would use the same signals each time, so we could see how the model is progressing in its efforts. So after about 15 minutes of training, I got my first snapshot of where the model was at.
As you can see, this was really bad. Honestly I was a little concerned at this point as it still looked like pure noise. But by round 6, the colors had calmed down significantly, but the images themselves still lacked any form.
At the halfway mark, we can clearly see shapes that stand out and contrast against the background colors, but everything still looks soft, as if all of the paint had bled onto the canvas.
By the 80% mark the images were clearly resembling abstract art, and were looking significantly more crisp.
By the final round, we do not see significantly more change, but what is fascinating is to look at some of the specific shapes and look at how the network continued to experiment with slightly different shapes and colors.
To see this all at once, I combined all 30 images into a gif. I would recommend watching it a few times focusing on different images you find fascinating to watch how they evolve.
So… is it Art?
Looking carefully at the sample images generated, it appears that about 25% of them still end up looking like noise. Maybe if the resolution was much higher these images would reveal the sophistication and nuance of Jackson Pollock’s works, but at 256×256 they just appear to be nonsense.
With the remaining 75%, I think to the “unsophisticated” eye they would be passible for modern art. An art scholar would probably have a different opinion from the rest of us, but they may even be more positive about their appraisal of the work (as long as they don’t know it’s computer generated.)
If we review the images looking for a theme, this becomes a bit more of a stretch. The eye can pick out pictures that look like they would be done by the same artist, but overall, the samples do seem to differ significantly from one to the next.
But we still haven’t answered the question. Is it art? Unfortunately, like many things in philosophy, this ultimately reduces down to a semantic argument about the definition of art itself. I could leave it at this, but I don’t get many chances to put my Philosophy degree to use, so I’ll take a shot at delivering a conclusion.
In my mind, the classification of the “artfulness” of a piece is done by the observer. This is evidenced by examples like differing opinions of the “artfulness” of progressive modern pieces. We do not have an objective ground truth of what “art” is, so ultimately it is just a concept which resides in the mind of the viewer, and is subjectively applied to the “pieces” they are exposed to. That being said, I think that many of the images from our GAN could be considered art, but ultimately that would be decided by each observer that encountered them.
But does this mean that our GAN is an artist? I wouldn’t go that far. I believe that self-expression is required from the artist. I would try to convince you, but I don’t think you want to read another 2,000 words of me talking in circles, so just take my word for it.
So even though our GAN generates art, there is still no underlying creativity, expression, or ideas that it is attempting to convey. It’s just noise in; math math math; picture out.
Does this mean AI can never be artistic? No, but before an AI could express itself, it would have to have a “self”, and unfortunately we are still a long way off from that.
Conclusion
The question I set out to answer this week was: Is it possible for an AI to generate “art”, and if so, does that mean that our AI is an artist? The TLDR answers: Can it generate art: I think so. Is it an artist: Not yet.
But even if it’s not ready to stand on its own, I am confident that over the coming decades, AI assisted artists will be come progressively more popular and mainstream. If you want to explore more, this website has links to over 30 different artistic AI applications.
In terms of next steps, I think I am good on generating art for now. But GAN’s have a ton of other potential applications, so I will definitely be revisiting these in the future.