Data Science: the “simple” art of arranging cool lists
20 January 2021
Every year, millions of individuals around the world share the artists they love, the music genres they discovered, the podcasts they really listen to vs the ones they claim they do, and what song they listened to 42 times in a single day (Run Boy Run after rewatching that episode of The Umbrella Academy).
Of course we are talking about Spotify Wrapped, an elegant summary of our listening habits and guilty pleasures. But above everything else, the most beautiful part about Spotify Wrapped is how “simple” it is.
At Good Rebels, we have recently proven this to be true… ish. Presenting #MisPalabrasTienenColor! (#MyWordsHaveColour).
Let’s be clear
Ok, we won’t lie, the above was just one of over 150 lines of code used for this project. But in a sense, it is the base of the whole development, a “simple” bot that generated close to 100 millions impressions. The best part ? We did not use any ads or influencers. We reached millions of people and even made it to the #1 spot in Mexico’s Trending Topics in the good ol’ fashioned organic way, viralization.
Our version of last year’s summary was also built with user data, but using a public platform: Twitter. The concept was straightforward, find the colour names the users have mentioned in their tweets and RTs during 2020, match every keyword with its respective color and return a personalised graph.
It is important to mention that our client’s colour names are not as simple as “blue” or “yellow”, they are a bit more creative than that; “Madrid”, “book”, “idea”, “life” and more than 2,000 other names, each with a different colour. The result is a graph that looks like this:
A (brief) technical explanation
And how did we get the bot to work?
All the code was written in R, mainly the rtweet and tidyverse libraries. You need a Twitter Developer account, and be willing to play a bit loose on Twitter for developers’ terms and conditions. And then, when you have everything ready, you can set it up on a remote computer, we used Azure’s Data Science Virtual Machine.
The script was programmed to run every minute of the day using cronR searching for new mentions of the hashtag #MisPalabrasTienenColor. When it found a new mention, the script searched all other tweets that the author made during 2020, find the color names, create a treemap graph and post it while mentioning the author on the client’s official and verified Twitter account. The script included a small code that kept a record of all the users that already have their graph, so even if the user tweets the HT again, it gets ignored.
No, honey. It’s not witchcraft, it’s literally data.
One of the cool things about the project is that the bot went live “secretly”, so the client could test it, but it escalated too quickly. Without an official mention, instructions or anything people discovered it within a few minutes, getting the HT to the list of Mexico’s trending topics in a couple of hours.
It takes the bot approximately 17 seconds to create a graph, and at one point the queue was over 4 hours. And, every time we tried a creative solution to speed the process up, we ended up with a developer app suspended within minutes.
The key lies in simplicity
In the end, the bot tweeted close to 10,000 graphs based on a cool, yet “simple” idea that came up in a Slack channel and, quite frankly, we weren’t even sure if it was going to work.
A “simple” deployment that broke, even for a brief moment, some of the ”Screw 2020” Twitter sentiment by providing a beautiful graph remembering everyone’s most mentioned words. And, of course, helping people discover our client’s colours while increasing brand awareness.
Simplicity wasn’t only at the core of our project, it is also the objective of this article. When reading and learning about data science, we are all used to come across complex innovations and developments that use powerful algorithms to create solutions and bring surreal ideas to life. It’s quite overwhelming and humbling to be honest.
But we are here to tell you that, sometimes, “simple” is much more than good enough. Sometimes we don’t need algorithms or NLP, as some people guessed. Sometimes all we need is just a whole lot of tidying up available public data.