#SaveUkraine: Exploratory Data Analysis
On February 24th, 2022, at 4 am European time, Russia invaded Ukraine. Needless to say, it’s been a shock to many, myself included, I’ve been “glued” to the news ever since. Not only did the picture of atrocities and looming humanitarian crisis haunt my daily life enough to consider doing an emergency fundraiser; the fact that a part of my family has historical ties to ex-USSR territory made it imperative for me to do whatever I can to help ordinary Ukrainian families. When you see the pictures of ordinary Soviet apartment blocks and private houses being decimated in front of your eyes, it’s easy to project these images onto your childhood memories of the places you visited as a kid, and you can no longer remain silent.
I pitched the idea and framework of a fundraising compilation for the Red Cross to French netlabel Kalamine Records on February 28th. I’ve been involved with this label for quite some time now, since 2019, when I pitched the idea of the annual INTENT compilation series to the label. I produced three iterations of the INTENT series every December and I don’t intend to stop any time soon. Moreover, Kalamine Records has always been a supporter of humanitarian causes, so it was the perfect fit for this project. This time around, it was the owner of Kalamine Records, Zumaia, who was collecting and organizing the tracks. The fundraising goal of €600 was satisfied in less than three weeks. From all corners of the world, people who love and support electronic and experimental music came together to help families in Ukraine. All proceeds were donated to the Red Cross in Ukraine, with a deduction of PayPal and Bandcamp fees. The compilation, called simply #SaveUkraine, can be listened to and downloaded here.
As a budding statistician, you’re always keen to explore the metrics of success in greater detail. How exactly did this music compilation fare better than other compilations on the label? Are there any clues that historical data may provide? All the data that I gathered and processed with the help of R came from the Bandcamp page of Kalamine Records. This required a lot of manual input, as well as in-depth research into databases like Discogs and MusicBrainz to find out the countries of origin for many artists since this information is often not specified on Bandcamp. The data below was provided by Zumaia though, as a starting point for the investigation: totals for purchases, plays, downloads, and the amount of money that was raised.
As we can see, the fundraising goal was met, and even an extra €100 was collected. More people have purchased the album than downloaded it, which suggests that it is being perceived more as a fundraiser than as an object of art. Within the framework of the compilation, I dropped the requirements for mastering and other “finesse”, focusing on just collecting tracks with artist and country information, since the fundraiser was thought of as quick and effective rather than slow-going and produced with the same care as INTENT. I guess this might explain the numbers.
It would be very difficult to get data on downloads for every Kalamine Records compilation, but the mean number of tracks on past compilations, along with the median and the mode, i.e. the value that has the highest number of occurrences, suggests that #SaveUkraine is a mighty outlier and indicates the enormous success of the project. If the mean and median values are similar, it indicates that the data’s distribution is almost symmetrical, and the inclusion of such extreme values will skew the distribution. And, even if the volumes of the INTENT iterations could be lumped together as one compilation, the total number of tracks on one compilation has never surpassed 65 tracks so far.
So, has the compilation project been successful? There are indications that it might be a resounding yes.
What was the scope of the outreach based on the countries represented by the artists on the compilation? The maps below, as well as the word cloud, can give us a better idea.
Germany takes the lead on the tracklist, contributing the absolute majority of tracks, followed by the United States, France, the UK, Belgium, Italy, the Netherlands, and other countries from around the globe, reaching even South America and Oceania. We’ve been unsuccessful in reaching out to any artists from Russia, even though we’ve explicitly stated in Russian that artists from Russia are welcome to contribute, even anonymously. I guess this is understandable, given the current political climate in Russia.
Similar projects, and more on artist outreach
Two questions that popped up in my mind when I was looking at the statistics were the following: how many new artists did the compilation attract, and how does it stand in comparison with similar compilation projects? The first question can be answered by finding the complement or performing a so-called anti-join, between the two data sets: artists on the compilation, and artists appearing historically on the label since 2019 minus the compilation artists. But what about the second question?
Many people within the electronic and experimental music community were thinking along the same lines about the war, which is great because it brought more awareness. I noticed at least three similar projects that were started at the same time #SaveUkraine was started. One of the compilation projects that I contributed to personally is “Stop All Wars” (Стоп!) by the German label Attenuation Circuit. The label information for this compilation indicates that it is ever-growing and will continue until the last war on the planet ends, and so far Attenuation Circuit has managed to collect 84 tracks, which makes it a perfect candidate for comparison. The compilation could be downloaded and listened to in its entirety here.
Since many of the artists I work with appeared on this compilation, I was wondering. How many of the artists who appear on one compilation also appear on the other? Given that both projects have a similar number of tracks, how similar are they? And, most importantly, do both compilations involve the same pool of experimental artists statistically? Let’s try to answer this question by looking at the data. Fortunately, Attenuation Circuit gathered all the links in one place, which made inputting the data fast and easy.
Germany is overwhelmingly represented on the #SaveUkraine compilation, with the majority of the 43 new artists on Kalamine Records coming from Germany. The data, however, suggests that the pool of artists overlaps less than I initially thought, with the majority of artists appearing on both albums hailing from the United States. Let’s take a look at the maps and word cloud for “Stop All Wars.”
Still, the overall distribution of the countries looks deceivingly similar to the distribution of artists on #SaveUkraine, based on data, even if Attenuation Circuit managed to attract many artists from Russia and Eastern Europe. To conclude that the artist pools for both labels differ significantly, I have to perform a hypothesis test.
Null hypothesis: all proportions of artists from the respective countries on Attenuation Circuit are the same as proportions of artists from the respective countries on Kalamine Records.
Alternative hypothesis: at least one of these proportions differ.
I will be conducting a chi-square test of homogeneity in order to make an inference, drawing the conclusion from statistical evidence. The null hypothesis will be rejected if the p-value will happen to be less than the 5% significance level. The procedure is described in details here, I would just like to note that the assumption that all expected values are greater than 5 is too stringent in our case, and the p-value computed with the Monte Carlo simulation doesn’t differ a lot from the p-value computed without the simulation on our significance level, minimizing the chance of producing the Type II error, once again, on the chosen significance level. It would make a difference on 1% significance level.
Both of these values are below 0.05, indicating that we can reject the null hypothesis and conclude that on the 5% significance level the artist pools for Kalamine Records and Attenuation Circuit differ from one another on the country level.
So, to sum it up, what did I find out after analyzing the compilation data?
- The #SaveUkraine compilation was a massive success for Kalamine Records. The scope of the outreach was wide enough, with the majority of artists coming from Germany;
- The compilation attracted 43 new artists, which is roughly 16% of the old artist pool, with the majority of artists, once again, coming from Germany. Still, there is a very little artist overlap with other German compilation projects, like “Stop All Wars” by Attenuation Circuit;
- The artist pools for Kalamine Records and the German label Attenuation Circut differ significantly, so we can’t draw any meaningful association between them.
The code chunks I wrote for processing and analyzing the data, a mix of R code and SQL queries, are available on my GitHub. Feedback on this report is very welcome! I apologize that the site otherwise doesn’t contain much yet, I was eager to launch the blog as soon as possible, given the tight schedule at the university. The site is going to grow, and new reports, as well as other meaningful information, will be available soon enough.
By the way, this blog is running on a tiny Raspberry Pi that I connected to a solar battery. Green and sustainable energy!