I listen to a lot of music, and I listen to a lot of types of music. About once per year I go on a mission to find new music to add to my regular rotation. I don’t want to listen to the same albums forever, or the same genres; there’s a vast ocean of music out there and it would be criminal to just get comfortable and never explore its depths.
There’s usually two ways to go about this: active search and passive listening.
You can find music blogs, critics, etc., and listen to what’s popular. The problem here, I’ve found, is that no matter the genre music critics tend to gravitate toward “music critic music”. This is music that tends to be less listenable and more interesting. It’s stuff you can appreciate, like a good scotch or wine, but not something you’d like every day.
Then there’s the Algorithm, which is invariably awful. Spotify’s Discover Weekly long ago gave up on trying to be useful and instead regularly recommends artists I whose songs are already in my playlists and regular rotation, or genre classics whose names are so well-known if you were to try recommending them to a person in real life, that person would no longer take you seriously. For example, last week it pushed Fleetwood Mac, and this week it’s pushing Nine Inch Nails.1
My tastes are not particularly esoteric — and I have the data to prove it. Since 2006, I’ve scrobbled over 330,000 tracks from over 16,000 artists. And an embarrassing percentage of those are just Pink Floyd, Vangelis, and Nobuo Uematsu.
The Problem is… It’s Hard
Someone once said, “Writing about music is like dancing about architecture.” And it turns out, almost every other way to describe or quantify music is also ridiculously hard. This is another thing that makes music publications difficult to use as sources for new music. It’s already difficult to put into words why you like a thing, and when it comes to music, it’s easier to just describe how it makes you feel more than what it sounds like. So you’re left hunting down critics whose taste is identifiably similar to yours, or magazines that cover genres you like. Something I have neither the time or energy to do.
I thought software could have been better at this. Even Spotify’s alleged approach to recommendations, which is to find adjacent users and playlists, sounds like it should work well! Even without or before their perverse incentives reared their ugly heads2, this never worked as well as it could have. It’s hard for an algorithm to reliably learn that “I like this artist’s late work more than their early work,” for example.3 The “dream” of some omniscient Service adjusting to your habits and feeding you new things to love has rotted away thanks to adtech infesting every corner of our modern world.
The only software-focused approach I’ve ever seen that worked even remotely well was the Music Genome Project. Pandora knocked this so far out of the park no one ever hoped to come close, and they continually rule when it comes to putting together radio stations that adapt to your listening habits over time. It does seem like it’s gotten a little worse over the years, but I don’t have any data to back that up, just vibes. In any case, it’s maybe-deteriorated quality is still leagues ahead of any other streaming service.
Enter ChatGPT
Let me get this out of the way: LLMs are not for the production of anything of value. Even as dumb tools they tend to be about as useful as autocomplete. If I’m on a company website and see that they use AI “art” even for splash images, I make a note not to interact with or give money in any way to those companies. What’s more, the techbro inclination to disregard or belittle the arts so much as to try and automate them away with fancy Markov chains shows only how bereft these people are in imagination, in spirit, and in character.
That said, LLMs do have a use. And what better use, as a souped-up autocomplete, than a recommendation engine?
As it turns out… not very useful.
The Process
Using ChatGPT 4o, I uploaded a CSV containing all 330,000+ scrobbles from my Last.fm history along with a prompt to give me some recommendations based on that history.
Naturally, it also took many steps to refine the output.
The Bad
We all know LLMs are prone to hallucinations. Even after repeatedly correcting them, they will confidently proclaim some bullshit as truth. This is no less true here. Despite clear instructions in the prompt, the very first round of recommendations consisted only of music from the input data. The second round of recommendations was 75% artists that appeared in the input data.
I thought perhaps my approach was too wide — asking it to categorize 18 years of listening history might have been too much. So I tried to hone in on specific genres instead. Of course, ChatGPT failed this, too. It continued insisting that Vangelis was a Synthwave artist, for example.
In fact, it repeatedly failed basic statistical analysis. When asked about my most recent 12 months of listening history, it made up numbers and statistics, easily verified by just looking at my Last.fm account.
Despite continuous fiddling, multiple passes at trying to iron down a good prompt, and troves of data, ChatGPT could do no better than tell me things I already knew.
The Good
I decided to get more specific. Instead of trying to base recommendations on hard numbers, I went back to the vibes approach.
Thinking back on how Spotify described its recommendation engine, I decided to tell ChatGPT about groups of artists and specific playlists, describing what I like about a particular grouping. Then it would respond with a handful of new recommendations based on that input, which I put into a table where I could mark down my progress working through the list, along with my thoughts.
Out of ~40 albums recommended from these responses, I’ve listened now to about 20. Of those 20, I ended up rating 6 at 7/10 or above on first pass. That’s really not bad at all! A much higher percentage of hits compared to, say, perusing Pitchfork or Sputnik Music’s recent top lists.
Of course, for a handful of recommendations I could not tell if the albums suggested actually exist. If they do, they’re not on the streaming services I can use.
It sounds nice and easy when I describe it like this. But for the amount of time it took to get these responses, I could probably have gotten the same results myself by looking at the Similar Artists tab in Last.fm or any other streaming service. The output, while generally decent, is not novel, and in fact produced quite a few names I’ve already seen but just never got around to listening to.
Final Thoughts
What a wild and wonderful time to be alive. Out of the ether I pulled over 40 album recommendations and they are all immediately available at my fingertips for next to nothing. There is more to see and hear and experience than can ever be seen or heard or done in a hundred lifetimes. And it’s all so good. There’s so many specific subgenres of music that you can sink into any one and only come up for air a year later. You like pirate metal? You like new retro wave that pretends to be what 80s pop would be if it had kept on going for 30 years? You like broken hard drive techno? It’s out there!
At the same time, the flagship product of the company behind the biggest bubble in the history of any economy fails such basic tasks as “count how many times I listened to a specific artist in the past year, based on the data in this CSV.” This is supposed to be the thing that drives decision-making, summarizing, and producing “art”? This is the thing that movie studios are using to screen and write screenplays? This is the thing that our tech industry is scrabbling over itself to inject into every open orifice of their already over-bloated product offerings? Absolutely embarrassing, for everyone involved.
At the end of the day nothing beats a good recommendation from someone you know. Whether it’s a critic you follow and trust, a friend, or a barista at your local coffee shop, these people understand and at least grasp at the intangible qualities of music and what makes it grab a hold of you. There are ways in which two artists, albums, or songs are similar that no computer could hope to quantify.