library(spotifyr)
library(tidyverse)
# Vales from the Spotify developer creator dashboard and hidden before posting this blog
<- '394b4aaf093c485db4a8d856dc3da40f'
client_id <- '715d59ecfe7d4e95874a512cd6ecdff2'
client_secret
# set secrets as system values so I can use this app again without having to use my secret info again
Sys.setenv(SPOTIFY_CLIENT_ID = client_id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = client_secret)
For the last 14 years I’ve volunteered as a radio dj for KSUA and Overkill Radio. One of the most fun (and most work) things I do for my show every year is go through my new release albums that I’ve listened to that year to find my top 20 or so albums of the year. This takes a lot of time as I generally listen to over 100 new albums a year to keep my show fresh and general interest in metal up.
This year I decided to speed up the process by automating part of it. I’ve spent the last year making sure that any new album I listen to gets added to a master list of 2023 releases Spotify playlist, and then give the songs a binary ranking of like
and null. This process is nice because it’s fast, and at the end of the day ranking a song/album a 3.5/5 vs a 3/5 always feels unsatisfactory and arbitrary. I will talk about the issue and problems I had with this system after we do some API calls and data cleaning.
Starting with some library calls and setting up my secrets values.
Getting playlist data that I will use to clean up and filter my album lists to the top 23.
# authenticate my code and set the scope of what my app will access from a user library, users may have to log in to use this app
<- get_spotify_access_token()
access_token <-get_spotify_authorization_code(scope = c('playlist-read-private','user-library-read') )
acces_code
# get my playlist that I use to track 2023 albums that I listened to for the show
<- get_playlist('3RYDZXbSqyoI82UTlUeefM')
thh_2023_playlist
# made a copy of my liked tracks since Spotify's API doesn't let you access liked tracks directly anymore
<- get_playlist('4H3STvFlP1JtgSxFH6Y4H5') my_saved_tracks
So currently Spotify’s API doesn’t work with saved tracks. I tried a few scopes to resolve this, but none worked so I resorted to just copying all the tracks from my saved list to a new public playlist so I could add a saved flag to the values from my 2023 albums list. Now I need to get only the artist, tracks, track number, and total album tracks out from my 2023 playlist and saved tracks playlist so I can add in the liked
flag.
# get tracks and turn them into a list
<- thh_2023_playlist$tracks$items
tracks <- tracks %>%
thh_2023_playlist_cleaned select(track.artists,
track.album.name,
track.track_number,
track.name,
track.album.total_tracks,
track.duration_ms)
nrow(thh_2023_playlist_cleaned)
[1] 871
# repeat the process for liked tracks playlist
<- my_saved_tracks$tracks$items
saved_tracks <- saved_tracks %>%
saved_tracks_cleaned select(track.artists,
track.album.name,
track.track_number,
track.name,
track.album.total_tracks,
track.duration_ms)
nrow(saved_tracks_cleaned)
[1] 1597
This almost gets me what I want, but I need to get the artist out. track.artists
doesn’t quite get the band name out. I need to unnest the data to get access to the artist which is currently in a list under track.artist$name
<- thh_2023_playlist_cleaned %>%
thh_2023_playlist_cleaned_unnested unnest(track.artists)
<- thh_2023_playlist_cleaned_unnested %>%
thh_2023_playlist_cleaned_artist select(name,
track.album.name,
track.track_number,
track.name,
track.album.total_tracks,
track.duration_ms)nrow(thh_2023_playlist_cleaned_artist)
[1] 923
<- saved_tracks %>%
saved_tracks_unnested unnest(track.artists)
<- saved_tracks_unnested %>%
saved_tracks_cleaned_artist select(name,
track.album.name,
track.track_number,
track.name,
track.album.total_tracks,
track.duration_ms)nrow(saved_tracks_cleaned_artist)
[1] 1680
Well it looks like we got a few extra artists. I’m willing to bet that bands that have features list all artists featured on the album so the album will contain duplicate entries. Let’s check by looking at an album that I know will have a lot of features. City Morgue is a rap duo of Zilla Kami and Sos Mula who are both independent artists. If artist features are inflating our row count this will be the culprit.
<- thh_2023_playlist_cleaned_artist %>%
example_album filter(track.album.name == 'My Bloody America')
Yes that seems to be it. This album allow adds 30 rows to our original playlist length of 692 songs. So I will need to remove duplicated tracks like this. It seems that Spotify artist names do not limit to only album arist, but it seems that the ordering is consistent that the album artist is always the first track so I can take the minimum track for each album, get the artist name from that, then filter based on the first artist name.
<- thh_2023_playlist_cleaned_artist %>%
thh_2023_playlist_tracks group_by(track.album.name) %>%
mutate(main_artist = name[which.min(track.track_number)]) %>%
filter(name == main_artist) %>%
ungroup()
<- saved_tracks_cleaned_artist %>%
saved_playlist_tracks group_by(track.album.name) %>%
mutate(main_artist = name[which.min(track.track_number)]) %>%
filter(name == main_artist) %>%
ungroup()
<- thh_2023_playlist_tracks %>%
thh_2023_playlist_tracks mutate(liked = ifelse(track.name %in% saved_playlist_tracks$track.name, 1, 0)) %>%
group_by(track.album.name) %>%
mutate(score = (sum(liked) / first(track.album.total_tracks)) * 100) %>%
mutate(total_duration_album = sum(track.duration_ms)) %>%
mutate(total_duration_liked = sum(ifelse(liked == 1, track.duration_ms,0))) %>%
mutate(percent_duration_liked = (total_duration_liked / total_duration_album) * 100) %>%
arrange(desc(score)) %>%
ungroup()
<- thh_2023_playlist_tracks %>%
albums_in_score_order select(track.album.name, score) %>%
distinct(track.album.name, .keep_all = TRUE) %>%
arrange(desc(score))
<- thh_2023_playlist_tracks %>%
top_23_albums distinct(name, track.album.name) %>%
slice_head(n = 23)
<- thh_2023_playlist_tracks %>%
top_23 filter(name %in% top_23_albums$name)
<- thh_2023_playlist_tracks %>%
albums_2023 distinct(track.album.name) %>%
nrow
top_23_albums
# A tibble: 23 × 2
name track.album.name
<chr> <chr>
1 Megaton Sword Might & Power
2 Hanging Garden The Garden
3 Raider Trial By Chaos
4 Fires in the Distance Air Not Meant for Us
5 Ascension Under the Veil of Madness
6 Burial Hordes Ruins
7 1476 In Exile
8 Rise to the Sky Two Years of Grief
9 Valdrin Throne of the Lunar Soul
10 Sulphur Aeon Seven Crowns And Seven Seals
# ℹ 13 more rows
Well here are my top 23 albums of 2023. This saved me a lot of time in going through and figuring out what albums I should give another listen and what albums I should consider. There are a few albums that will be moved to being lower in the ranking, and 1 album that I missed in setting this all up that I’ll manually put it.
This was pretty fun to set up, and I didn’t realize the drawbacks of doing it this way until it was already too late. Giving tracks a binary score to generate an overall score is fast, and convinient, BUT it does require that I actively listen to albums. I can’t have music on in the background while I work and only click like when I notice something good. Yes, actively listening to music is part of being critic, but still something to keep in mind why trying to make one of these lists. The other problem is that if you listen to music on something else (I use foobar2000 at home) then you have to make sure that those liked tracks and albums get copied over to Spotify so maybe I’ll try doing something similar with Last FM for 2024. I did miss one album with this system as I listened to it, liked a bunch of tracks from it, and never added it to my playlist of 2023 albums that I listened to.
Statistics wise there isn’t much interesting going on with this data set.
mean(top_23$score)
[1] 85.05155
mean(top_23$total_duration_liked)
[1] 2585681
mean(top_23$total_duration_album)
[1] 2957495
So my average score for the top 23 albums was 86% and the mean duration liked was 43.6 minutes with an average duration of 49.5 minutes. I didn’t look around too much to see what other lists lined up with mine, but the metal people I shared this data with said it was a pretty solid list of albums for 2023.