Spotify API

analysis
statistics
R Shiny
dashboard
webscraping
hobby
Author

Kevin Swenson

Published

February 2, 2024

For the last 14 years I’ve volunteered as a radio dj for KSUA and Overkill Radio. One of the most fun (and most work) things I do for my show every year is go through my new release albums that I’ve listened to that year to find my top 20 or so albums of the year. This takes a lot of time as I generally listen to over 100 new albums a year to keep my show fresh and general interest in metal up.

This year I decided to speed up the process by automating part of it. I’ve spent the last year making sure that any new album I listen to gets added to a master list of 2023 releases Spotify playlist, and then give the songs a binary ranking of like and null. This process is nice because it’s fast, and at the end of the day ranking a song/album a 3.5/5 vs a 3/5 always feels unsatisfactory and arbitrary. I will talk about the issue and problems I had with this system after we do some API calls and data cleaning.

Starting with some library calls and setting up my secrets values.

library(spotifyr)
library(tidyverse)

# Vales from the Spotify developer creator dashboard and hidden before posting this blog
client_id <- '394b4aaf093c485db4a8d856dc3da40f'
client_secret <- '715d59ecfe7d4e95874a512cd6ecdff2'

# set secrets as system values so I can use this app again without having to use my secret info again
Sys.setenv(SPOTIFY_CLIENT_ID = client_id)
Sys.setenv(SPOTIFY_CLIENT_SECRET = client_secret)

Getting playlist data that I will use to clean up and filter my album lists to the top 23.

# authenticate my code and set the scope of what my app will access from a user library, users may have to log in to use this app
access_token <- get_spotify_access_token()
acces_code<-get_spotify_authorization_code(scope = c('playlist-read-private','user-library-read') )

# get my playlist that I use to track 2023 albums that I listened to for the show
thh_2023_playlist <- get_playlist('3RYDZXbSqyoI82UTlUeefM')

# made a copy of my liked tracks since Spotify's API doesn't let you access liked tracks directly anymore 
my_saved_tracks <- get_playlist('4H3STvFlP1JtgSxFH6Y4H5')

So currently Spotify’s API doesn’t work with saved tracks. I tried a few scopes to resolve this, but none worked so I resorted to just copying all the tracks from my saved list to a new public playlist so I could add a saved flag to the values from my 2023 albums list. Now I need to get only the artist, tracks, track number, and total album tracks out from my 2023 playlist and saved tracks playlist so I can add in the liked flag.

# get tracks and turn them into a list
tracks <- thh_2023_playlist$tracks$items
thh_2023_playlist_cleaned <- tracks %>%
    select(track.artists,
                 track.album.name,
                 track.track_number,
                 track.name,
                 track.album.total_tracks,
                 track.duration_ms)

nrow(thh_2023_playlist_cleaned)
[1] 871
# repeat the process for liked tracks playlist
saved_tracks <- my_saved_tracks$tracks$items
saved_tracks_cleaned <- saved_tracks %>%
    select(track.artists,
                 track.album.name,
                 track.track_number,
                 track.name,
                 track.album.total_tracks,
                 track.duration_ms)

nrow(saved_tracks_cleaned)
[1] 1597

This almost gets me what I want, but I need to get the artist out. track.artists doesn’t quite get the band name out. I need to unnest the data to get access to the artist which is currently in a list under track.artist$name

thh_2023_playlist_cleaned_unnested <- thh_2023_playlist_cleaned %>%
    unnest(track.artists)

thh_2023_playlist_cleaned_artist <- thh_2023_playlist_cleaned_unnested %>%
    select(name,
         track.album.name,
         track.track_number,
         track.name,
         track.album.total_tracks,
                 track.duration_ms)
nrow(thh_2023_playlist_cleaned_artist)
[1] 923
saved_tracks_unnested <- saved_tracks %>%
    unnest(track.artists)

saved_tracks_cleaned_artist <- saved_tracks_unnested %>%
    select(name,
         track.album.name,
         track.track_number,
         track.name,
         track.album.total_tracks,
                 track.duration_ms)
nrow(saved_tracks_cleaned_artist)
[1] 1680

Well it looks like we got a few extra artists. I’m willing to bet that bands that have features list all artists featured on the album so the album will contain duplicate entries. Let’s check by looking at an album that I know will have a lot of features. City Morgue is a rap duo of Zilla Kami and Sos Mula who are both independent artists. If artist features are inflating our row count this will be the culprit.

example_album <- thh_2023_playlist_cleaned_artist %>% 
    filter(track.album.name == 'My Bloody America')

Yes that seems to be it. This album allow adds 30 rows to our original playlist length of 692 songs. So I will need to remove duplicated tracks like this. It seems that Spotify artist names do not limit to only album arist, but it seems that the ordering is consistent that the album artist is always the first track so I can take the minimum track for each album, get the artist name from that, then filter based on the first artist name.

thh_2023_playlist_tracks <- thh_2023_playlist_cleaned_artist %>%
    group_by(track.album.name) %>%
    mutate(main_artist = name[which.min(track.track_number)]) %>%
    filter(name == main_artist) %>%
    ungroup()

saved_playlist_tracks <- saved_tracks_cleaned_artist %>%
    group_by(track.album.name) %>%
    mutate(main_artist = name[which.min(track.track_number)]) %>%
    filter(name == main_artist) %>%
    ungroup()

thh_2023_playlist_tracks <- thh_2023_playlist_tracks %>%
    mutate(liked = ifelse(track.name %in% saved_playlist_tracks$track.name, 1, 0)) %>%
    group_by(track.album.name) %>%
    mutate(score = (sum(liked) / first(track.album.total_tracks)) * 100) %>%
        mutate(total_duration_album = sum(track.duration_ms)) %>% 
        mutate(total_duration_liked = sum(ifelse(liked == 1, track.duration_ms,0))) %>% 
        mutate(percent_duration_liked = (total_duration_liked / total_duration_album) * 100) %>% 
        arrange(desc(score)) %>% 
    ungroup()

albums_in_score_order <- thh_2023_playlist_tracks %>%
  select(track.album.name, score) %>%
  distinct(track.album.name, .keep_all = TRUE) %>%
  arrange(desc(score))

top_23_albums <- thh_2023_playlist_tracks %>% 
  distinct(name, track.album.name) %>%
  slice_head(n = 23)

top_23 <- thh_2023_playlist_tracks %>%
    filter(name %in% top_23_albums$name)

albums_2023 <- thh_2023_playlist_tracks %>% 
    distinct(track.album.name) %>% 
    nrow
top_23_albums
# A tibble: 23 × 2
   name                  track.album.name            
   <chr>                 <chr>                       
 1 Megaton Sword         Might & Power               
 2 Hanging Garden        The Garden                  
 3 Raider                Trial By Chaos              
 4 Fires in the Distance Air Not Meant for Us        
 5 Ascension             Under the Veil of Madness   
 6 Burial Hordes         Ruins                       
 7 1476                  In Exile                    
 8 Rise to the Sky       Two Years of Grief          
 9 Valdrin               Throne of the Lunar Soul    
10 Sulphur Aeon          Seven Crowns And Seven Seals
# ℹ 13 more rows

Well here are my top 23 albums of 2023. This saved me a lot of time in going through and figuring out what albums I should give another listen and what albums I should consider. There are a few albums that will be moved to being lower in the ranking, and 1 album that I missed in setting this all up that I’ll manually put it.

This was pretty fun to set up, and I didn’t realize the drawbacks of doing it this way until it was already too late. Giving tracks a binary score to generate an overall score is fast, and convinient, BUT it does require that I actively listen to albums. I can’t have music on in the background while I work and only click like when I notice something good. Yes, actively listening to music is part of being critic, but still something to keep in mind why trying to make one of these lists. The other problem is that if you listen to music on something else (I use foobar2000 at home) then you have to make sure that those liked tracks and albums get copied over to Spotify so maybe I’ll try doing something similar with Last FM for 2024. I did miss one album with this system as I listened to it, liked a bunch of tracks from it, and never added it to my playlist of 2023 albums that I listened to.

Statistics wise there isn’t much interesting going on with this data set.

mean(top_23$score)
[1] 85.05155
mean(top_23$total_duration_liked)
[1] 2585681
mean(top_23$total_duration_album)
[1] 2957495

So my average score for the top 23 albums was 86% and the mean duration liked was 43.6 minutes with an average duration of 49.5 minutes. I didn’t look around too much to see what other lists lined up with mine, but the metal people I shared this data with said it was a pretty solid list of albums for 2023.