Using Data Science to: Blend the styles of Shakespeare and Eminem.
Introduction
Growing up I listened to Eminem A LOT, and I was constantly in awe of his ability to tell vivid stories while still managing to rhyme. Songs like “Stan“ were unlike anything else I had ever heard, and I remember wondering if one day people would study his songs the way we study Shakespeare.
That idea stuck with me over the years, and I was often curious about what it would look like if you could combine the styles of Eminem and Shakespeare into one.
But a few weeks ago, I worked on a project using “style transfer”, which is a specific type of “transfer learning”. (I wrote a 2 minute explanation of transfer learning here).
My project allowed me to combine elements of two pictures into one, by unfreezing specific layers of a model.
I wanted to find out if I could use a similar approach to combine the styles of two writers, Shakespeare and Eminem, for the purpose of text generation. So this is what I set out to accomplish.
My plan was to take a language generation model and train it on one of the writers first, and then “fine-tune” it on the other. The hope was that I could find a way to get the style and content of Eminem, with the structure and language of Shakespeare.
The Data
The first step was to gather the data. Luckily, Shakespeare is a pretty popular subject in Natural Language processing, so this dataset was easy to find.
Eminem however, proved to be a bit harder to track down. After some searching, I found that what I needed wasn’t out there, so I had to scrape it. I decided to use all of his songs, but exclude the skits and introductions from his albums. All together I gathered 224 songs, and combined them into one text file. I’ve made the data available here.
With the Data in hand, it was time to move onto building the model.
Selecting a Model
For text generation, I really had two model architecture options, “Recurrent Neural Networks” (RNNs) and “Transformers”.
I will write posts explaining these in more detail later, but in the simplest terms possible, a RNN is a special type of neural network that is typically used to deal with sequential data. This could be something like “time series” data such as stock prices, or a “natural language” problem, where small chunks of text are flowed into the model at each time step.
A transformer is a bit harder to explain simply, but they are built around the idea of an “attention mechanism”. This allows the model to learn which other words in the sentence to focus on, and has other computational benefits such as enabling parallel processing.
I decided to start with a RNN, as it would give me total control over the structure of the model, but if it did not work out, I would use a transformer as my fallback plan.
RNN Approach
With the data and a plan, I set out to build a RNN based model for text generation. Basically, this would work by learning to predict the next letter in any given sequence. During the prediction phase, I would give it a word or phrase to start with, and it would keep predicting letters until it ultimately predicted an “end of sequence” token.
I wasn’t entirely sure what would happen, so I started by making a model for each writer, just to make sure the text generation would work. Here is a sample from each, using the prompt “Romeo”.
I was happy to see that each worked independently. The Shakespeare model mimicked his form, language, and content, while the Eminem model did the same, and even captured some rhyming structure.
The next step was to find a way to combine them, and this is where the issues began. I tried to train first on Shakespeare then Eminem, which did not work. So I swapped the order, and it did not work again. Next, I shuffled the two datasets together and tried it again, but still it did not work.
I made the model learn slower, and faster. I froze and unfroze layers, but still, no matter what I tried, I could not blend the two styles. Instead, every time it would go down one path or the other, sounding clearly Shakespearian or Eminemian. (Two examples from the same model can be seen below.)
After some reflection, I figured that this was a result of the way the predictions took place, so I decided to move onto the transformer and see if I would have better results.
Transformer Approach
For the transformer approach, I decided to use a model called “GPT-+62” by OpenAI. This model uses a pretty straight forward transformer architecture, but has a ton of parameters and was pre-trained on a huge dataset, so in practice it actually works really well.
The first step was to create baselines for each writer as I did with the RNN. This worked well, but revealed some differences. Specifically, GPT-2 seemed to capture “ideas” better, and contain more continuity from line to line. This was clearest with the Eminem model, as GPT generated lyrics that tended to make more sense and actually even contained some word play.
But when it came time to blend the two, I ran into the same problems all over again. The closest I came was through pre-training on Eminem, and then fine-tuning on Shakespeare at a 100x slower rate. But at best, I could only see glimmers that the model had retained some of both writers.
Conclusion
The question I set out to answer with this project was: Can I blend the styles of Eminem and Shakespeare? Unfortunately, the answer to this one is No, at least for now.
The truth is that this is probably possible given the current models that are available, but getting it to work would be very tedious. If I had the time and enough computing power, the answer would most likely come from using GPT-2 Extra Large, and experimenting with freezing and unfreezing different layers throughout the model.
Presumably, through enough experimentation, I could find which layers worked best to control structure, and which were best for managing content so that I could use them accordingly to blend the styles.
Unfortunately, GPT-2 Extra Large is so big, I cannot even run it on the cloud without paying for it. So for now, I’ll take my loss and revisit this later as new models and solutions become available. Plus in the mean time, there’s always memes to satisfy my curiosity.