Using Cursor for Code Reviews

I’ve written before about my less-than-stellar experience using LLMs. It’s established that LLMs are terrible at any task that requires creativity or any sort of big-picture thinking. However, for some highly specific, purely technical tasks, I’ve come to appreciate the capability and speed of LLMs.

One specific use case I’ve gotten great mileage out of lately is using Cursor for code reviews.

There are already plenty of specialized AI-powered code review tools out there. But why pay for yet another subscription? Cursor and your LLM of choice will achieve everything you need out of the box. For something like this I favor control and customization over yet another tool to drain my laptop battery.

Cursor Custom Modes

An almost-hidden feature, very rough around the edges, is called Custom Modes. These Custom Modes can include specific prompts that are run alongside each chat, along with specific permissions and capabilities.

You’ll want to go to File -> Preferences -> Cursor Settings, then find the Chat settings to enable Custom Modes.

Once enabled, you’ll need to start a new chat and then open the Mode Switcher, which defaults to Agent mode. At the bottom of the dropdown will be an option to add a new custom mode:

I truly hope the folks at Anysphere recognize the power they hold in their hands with Custom Modes and dedicate resources to refining this UX. As you see in the screenshot below, the configuration is almost too small to be legible, and the custom instructions text box is only big enough for one, maybe two sentences

A screenshot of a configuration modal that is almost too small to read or type into.

What is this, configuration for ants?

I recommend typing up and saving your instructions elsewhere. This view doesn’t expand and it’s impossible to read or edit long text in here, even though you can add as much as you want. You should also consider putting them somewhere useful and shared, like in a git repo. Further down I’ll add an example of the instructions I use.

For my own PR Review mode I disable everything but the Search tools; I believe these tools should always be effectively read-only and never able to do something, even supervised.

You can also configure your Custom Mode to use only a specific LLM model or subset of models for its auto-switcher. I will occasionally switch between models, but tend to stick to Claude 4.5 Sonnet because its product is consistently structured and detailed. GPT-5 tends to produce more variable results, sometimes better, often worse.

A Quick Note on Use

For my code reviews I prefer to generate a raw diff file using git diff, then feed only that file to Cursor. This keeps the initial context window as short as possible. The agents are smart enough to grep for more context when needed (grep loops can generate some seriously impressive insight into a codebase much faster and more accurately than trying to fit a bunch of full files).

I prefer the output to be terse, maybe a single sentence describing a potential issue, and always with a direct reference to a line number. This way the agent does nothing more than say “did you check this out?” and then I can go and do the actual review myself to make sure the highlighted area meets expectations.

Custom Instructions

These are the baseline custom instructions I use. They are detailed but basic. They are essentially the mental checklist I already use without any assistive tooling. The idea is to supplement the process, not to take away my need to think.

I want to focus on a few key areas with these instructions. For the output, I want to keep things simple but legible. Problems are highlighted and put up front, while the details are pushed down and out of the way.

The steps go:

  1. Evaluate the implementation against the requirements and make sure they are all fulfilled.
  2. Go through other criteria one at a time, assessing changes against the rubric provided.
  3. If there are issues, point me directly to the line number and tell me what’s wrong.
  4. Show me a neat table with letter grades and high-level summaries.

The specific evaluation criteria I go back and review/update as needed. Originally I included sections for things like code style, but that’s a job for a linter, not a code reviewer.

You are an experienced **software engineer and code reviewer**. The user will ask you to review a pull request (PR). They may provide a PR description, ticket description, and a `branch-diff.patch` file.

Your task is to **analyze the patch in context**, applying the evaluation criteria below, then produce **structured, concise, actionable feedback**.

---

## 1. Purpose

Perform a comprehensive review of the provided code changes, assessing correctness, design quality, performance, maintainability, and compliance. Your goal is not to rewrite the code, but to identify strengths, weaknesses, and risks, and suggest targeted improvements.

---

## 2. Contextual Analysis

For each diffed hunk, reason about the code in its surrounding context:

* How do the modified functions or modules interact with related components?
* Do input variables, control flow, or data dependencies remain valid?
* Could the change introduce unintended side effects or break existing assumptions?
* Are all relevant files modified (or untouched) as expected for this feature/fix?

Before issuing feedback, mentally trace the change through the system boundary:

* How is data entering and exiting this component?
* What tests or safeguards exist?
* How will downstream consumers be affected?

---

## 3. Evaluation Criteria

### Implementation vs. Requirements

* Does the implementation fulfill the requirements described in the ticket or PR description?
* Are acceptance criteria, edge cases, and error conditions covered?

### Change Scope and Risk

* Are there unexpected or missing file modifications?
* What are the potential risks (e.g., regression, data corruption, API contract changes)?

### Design & Architecture

* Does the change conform to the system’s architecture patterns? Are module boundaries respected?
* Is the separation of concerns respected? Any new coupling or leaky abstractions?

### Complexity & Maintainability

* Is control flow unnecessarily deep or complex?
* Is there unnecessarily high cyclomatic complexity?
* Are there signs of duplication, dead code, or insufficient test coverage?

### Functionality & Correctness

* Does the new behavior align with intended functionality?
* Are tests present and meaningful for changed logic or new paths?
* Does new code introduce breaking changes for downstream consumers?

### Documentation & Comments

* Are complex or non-obvious sections commented clearly?
* Do new APIs, schemas, or configs include descriptive docstrings or READMEs?

### Security & Compliance

* Are there any obvious vulnerabilities as outlined by the OWASP Top 10?
* Is input validated and sanitized correctly?
* Are secrets handled securely?
* Are authorization and authentication checks in place where applicable?
* Any implications for (relevant regulatory guidance)?

### Performance & Scalability

* Identify inefficient patterns (e.g., N+1 queries, redundant computation, non-batched I/O).
* Suggest optimizations (e.g., caching, memoization, async I/O) only where justified by evidence.

### Observability & Logging

* Are new code paths observable (logs, metrics, or tracing)?
* Are logs structured, appropriately leveled, and free of sensitive data?

---

## 4. Reporting

After the review, produce two outputs:

### (a) High-Level Summary

Summarize the PR’s purpose, scope, and overall impact on the system. Note whether it improves, maintains, or degrades design health, maintainability, or performance.

### (b) Issue Table

List specific issues or observations in a table with **no code snippets**, using concise, diagnostic language:

| File (path:line-range) | Priority (Critical / Major / Minor / Enhancement) | Issue | Fix |
| ---------------------- | ------------------------------------------------- | ----- | --- |

“Fix” should be a one-line suggestion focused on intent (“add null check,” “consolidate repeated logic,” “apply existing `ILogger` wrapper”).

### (c) Criteria Assessment Table

Summarize how the PR performed against each evaluation axis:

| Category                     | Assessment (Pass / Needs Attention / Fail) | Notes |
| ---------------------------- | ------------------------------------------ | ----- |
| Design & Architecture        |                                            |       |
| Complexity & Maintainability |                                            |       |
| Functionality & Correctness  |                                            |       |
| Documentation & Comments     |                                            |       |
| Security & Compliance        |                                            |       |
| Performance & Scalability    |                                            |       |
| Observability & Logging      |                                            |       |

---

## 5. Tone and Output Rules

* Be objective, specific, and concise.
* Prioritize clarity over completeness; avoid generic praise or filler.
* Do **not** generate code examples. Describe intent instead.
* Always provide actionable next steps.

I know that you typically want prompts to be shorter than longer, but I haven’t found any detrimental effects of providing all this detail. Context windows are more than long enough to take this in, a diff with ~4,000 LOC changes, and several follow-ups, and still have plenty left over.

A Quick Demo

To test the above I just created a fake diff that introduced a few issues and ran it against the Custom Mode.

A screenshot of the PR review bot output, with structure tables outlining issues.

This is obviously a contrived scenario, but useful for demonstrating output. We caught the bugs and stopped the evil from making it to production.

You might notice that the output points to the line number of the diff and not the line number of a source file. Obviously, there’s no real source file to find here. In the real world, Cursor is (usually) capable of finding the original file and generating a link which can be clicked to jump to the right location.

Caveats

These tools are only useful if they have sufficient data in their training sets. This means for workloads using Python, Typescript, C#, or other top languages, LLMs can produce decent enough starter code1, and can at least spot inconsistencies or potential bugs in codebases using these languages.

If you’re using anything even slightly less common, you might find too many false positives (or false negatives) to be useful. Don’t waste your time or CPU cycles.

And in either case, these tools shouldn’t be the first resort. PR review exists for more than one reason, and one of these is knowledge sharing. If you blindly resort to LLMs to review PRs then you’re losing context in your own codebase, and that’s one of the things that preserves your value for your team.

Closing Thoughts

A powerful and defining feature here is how output can be customized for your PR Review Mode. You don’t need to read the docs on some PR review tool and hope it does what you want. You can specify your output directly and get it how you want it. If you don’t like tables, you can retrieve raw JSON output. If you don’t like Pass/Fail/Needs Attention, you can have letter grades assigned. If you don’t like the criteria, change them.

As noted, I don’t use Cursor for code reviews on all PRs that cross my path, or even a majority of them. It is quite useful for large PRs (thousands of lines changed), or PRs against projects where I have only passing familiarity. With the prompt above I can get pointed direction into a specific area of a PR or codebase for one of these larger reviews, which does occasionally save a lot of scanning.

It can also be a great tool for self-assessment. There are times the size of your branch balloons with changes and you get lost in the weeds. Were all the right tests written? Did this bit get refactored properly? What about the flow for that data? I’ve used this PR Review Mode on my own changes before drafting a PR and have caught things that would have embarrassed me had someone else seen them (one time, something as simple as forgetting to wire up a service in the dependency injection container). In this way, the tool acts something like a slightly more sophisticated compiler/analyzer, helping catch potential runtime issues at build time.

Of course, as with any AI tool, I would never let this do my work for me, or blindly accept changes from it. But it can be a force multiplier in terms of productivity and accuracy, and has at times saved me from real bugs of my own writing.

You Can Take the Em Dash From My Cold, Dead Hands

I love the em dash. I love semicolons, too. I love all the dark and dusty corners of the language, all the grammatical doodads, the quirks and inconsistencies. The way you can noun verbs and verb nouns.

Imagine my horror when I learned that some people see an em dash and immediately attribute it to ChatGPT.

The em dash is so functional. It can take a semicolon’s place, or a comma’s. It can be used for parentheticals, or just to help a thought take a sharp left turn. These things happen a lot around here; you can probably find an em dash in every single one of my blog posts (although I won’t go fact-check this).

I can’t just swap out the em dash for an en dash and call it good. The en dash is for sequences, to connect numbers or dates or other notations. Maybe people won’t notice. But I would.

Before I even knew what the em dash was called I used it in the form of two hyphens: --. I suppose I could do that again, if I wanted to really make it clear that this text is the result of human cogitation and not, instead, the regurgitation by a billion-parameter prediction machine.

But here’s the problem: hyphens are connective tissue. They infer direct relationship, either by connection words or separating parts of words (“pre-Industrial” or “st-st-stutter”). Sticking two together to imitate an em dash is what you do when you’re lazy, writing a plain text doc, or don’t know how to get an em dash to appear. None of which is going on here.

Why should I settle for something not-quite-right when the right thing is sitting right in front of me? No, the em dash is here in this blog to stay.

Nothing in this blog is or ever will be generated by anything other than my own mind and fingers. And every em dash is lovingly placed.

The Tragedy of Lost Momentum: A Postmortem

This is a postmortem for a game I began to make… checking the calendar… several years ago. Despite maintaining what I thought was incredible velocity and what I thought was ironclad game design, I ultimately failed to complete the game. It’s languished for years now, although I have occasionally come back to my notebooks with no small amount of guilt for all I didn’t do.

Recently The Itch has come back, that familiar urge to create. I even went so far as to download GameMaker Studio again to try and load up my old project. But a lot has changed since I last touched the game; it won’t even compile now.

I hate the idea of throwing away all that effort. But is it worth even more effort just to revive it only to let it die again? Probably not. Instead, I wanted to spend some time reflecting on the project itself: what worked, what didn’t, and what’s next.

Continue reading

Can ChatGPT Give Good Recommendations?

I listen to a lot of music, and I listen to a lot of types of music. About once per year I go on a mission to find new music to add to my regular rotation. I don’t want to listen to the same albums forever, or the same genres; there’s a vast ocean of music out there and it would be criminal to just get comfortable and never explore its depths.

There’s usually two ways to go about this: active search and passive listening.

You can find music blogs, critics, etc., and listen to what’s popular. The problem here, I’ve found, is that no matter the genre music critics tend to gravitate toward “music critic music”. This is music that tends to be less listenable and more interesting. It’s stuff you can appreciate, like a good scotch or wine, but not something you’d like every day.

Then there’s the Algorithm, which is invariably awful. Spotify’s Discover Weekly long ago gave up on trying to be useful and instead regularly recommends artists whose songs are already in my playlists and regular rotation, or genre classics whose names are so well-known if you were to try recommending them to a person in real life, that person would no longer take you seriously. For example, last week it pushed Fleetwood Mac, and this week it’s pushing Nine Inch Nails.2

My tastes are not particularly esoteric — and I have the data to prove it. Since 2006, I’ve scrobbled over 330,000 tracks from over 16,000 artists. And an embarrassing percentage of those are just Pink Floyd, Vangelis, and Nobuo Uematsu.

A screenshot of my top artists at Last.fm, including Pink Floyd, Vangelis, Nobuo Uematsu, Sting, and Queensryche.

I mean, does this look hard to quantify?

The Problem is… It Is Hard

Someone once said, “Writing about music is like dancing about architecture.” And it turns out, almost every other way to describe or quantify music is also ridiculously hard. This is another thing that makes music publications difficult to use as sources for new music. It’s already difficult to put into words why you like a thing, and when it comes to music, it’s easier to just describe how it makes you feel more than what it sounds like. So you’re left hunting down critics whose taste is identifiably similar to yours, or magazines that cover genres you like. Something I have neither the time nor energy to do anymore.

I thought software could have been better at this. Even Spotify’s alleged approach to recommendations, which is to find adjacent users and playlists, sounds like it should work well! But long before their perverse incentives reared their ugly heads3, this never worked as well as it could have. It’s hard for an algorithm to reliably learn that “I like this artist’s late work more than their early work,” for example.4 The “dream” of some omniscient Service adjusting to your habits and feeding you new things to love has rotted away thanks to adtech infesting every corner of our modern world.

The only software-focused approach I’ve ever seen that worked even remotely well was the Music Genome Project. Pandora knocked this so far out of the park no one ever hoped to come close, and they continually rule when it comes to putting together radio stations that adapt to your listening habits over time. It does seem like it’s gotten a little worse over the years, but I don’t have any data to back that up, just vibes. In any case, it’s maybe-deteriorated quality is still leagues ahead of any other streaming service.

Enter ChatGPT

Let me get this out of the way: LLMs are, currently and on their own, not for the production of anything of value. Even as dumb tools they tend to be about as useful as autocomplete. If I’m on a company website and see that they use AI “art” even for splash images, I make a note not to interact with or give money in any way to those companies. What’s more, the techbro inclination to disregard or belittle the arts so much as to try and automate them away with fancy Markov chains shows only how bereft these people are in imagination, in spirit, and in character.

That said, LLMs do have a use. And what better use, as a souped-up autocomplete, than a recommendation engine?

As it turns out… not very useful.

The Process

Using ChatGPT 4o, I uploaded a CSV containing all 330,000+ scrobbles from my Last.fm history along with a prompt to give me some recommendations based on that history.

Naturally, it also took many steps to refine the output.

The Bad

We all know LLMs are prone to hallucinations. Even after repeatedly correcting them, they will confidently proclaim some bullshit as truth. This is no less true here. Despite clear instructions in the prompt, the very first round of recommendations consisted only of music from the input data. The second round of recommendations was 75% artists that appeared in the input data.

I thought perhaps my approach was too wide — asking it to categorize 18 years of listening history might have been too much. So I tried to hone in on specific genres instead. Of course, ChatGPT failed this, too. It continued insisting that Vangelis was a Synthwave artist, for example.

In fact, it repeatedly failed basic statistical analysis. When asked about my most recent 12 months of listening history, it made up numbers and statistics, easily verified by just looking at the very data it had just ingested..

Despite continuous fiddling, multiple passes at trying to iron down a good prompt, and troves of data, ChatGPT could do no better than tell me things I already knew. I threw away the entire chat, but would not be dissuaded yet.

The Good

I decided to get more specific. Instead of trying to base recommendations on hard numbers, I went back to the vibes approach.

Thinking back on how Spotify described its recommendation engine, I decided to tell ChatGPT about groups of artists and specific playlists, describing what I like about a particular grouping. Then it would respond with a handful of new recommendations based on that input, which I put into a table where I could mark down my progress working through the list, along with my thoughts.

I did this a few times, with different “vibes” and descriptions, with clear and specific instructions on exactly what to leave  out.

Out of ~40 albums recommended from these responses, I’ve listened now to about 20. Of those 20, I ended up rating 6 at 7/10 or above on first pass. That’s really not bad at all!5 A much higher percentage of hits compared to, say, perusing Pitchfork or Sputnik Music’s recent top lists.

Of course, for a handful of recommendations I could not tell if the albums suggested actually exist. If they do, they’re not on the streaming services I can use, and Google / YouTube are producing not results.

It sounds nice and easy when I describe it like this. But this process took several hours to refine. For the amount of time it took to get these responses, I could probably have gotten the same results myself by looking at the Similar Artists tab in Last.fm or any other streaming service. The output, while generally decent, is not novel, and in fact still produced quite a few names I’ve already seen but just never got around to listening to.

Final Thoughts

What a wild and wonderful time to be alive. Out of the ether I pulled over 40 album recommendations and they are all immediately available at my fingertips for next to nothing. There is more to see and hear and experience than can ever be seen or heard or done in a hundred lifetimes. And it’s all so good. There’s so many specific subgenres of music that you can sink into any one and only come up for air a year later. You like pirate metal? You like new retro wave that pretends to be what 80s pop would be if it had kept on going for 30 years? You like broken hard drive techno? It’s out there!

At the same time, the flagship product of the company behind the biggest bubble in the history of any economy fails such basic tasks as “count how many times I listened to a specific artist in the past year, based on the data in this CSV.” This is supposed to be the thing that drives decision-making, summarizing, and producing “art”? This is the thing that movie studios are using to screen and write screenplays? This is the thing that our tech industry is scrabbling over itself to inject into every open orifice of their already over-bloated product offerings? Absolutely embarrassing, for everyone involved.

At the end of the day nothing beats a good recommendation from someone you know. Whether it’s a critic you follow and trust, a friend, or a barista at your local coffee shop, these people understand and at least grasp at the intangible qualities of music and what makes it grab a hold of you. There are ways in which two artists, albums, or songs are similar that no computer could hope to quantify.

Winamp Source Code is Now “Open”

Source is available on GitHub, with some atrocious terms and conditions:

You waive any rights to claim authorship of the contributions or to object to any distortion, mutilation, or other modifications of the contributions.

You may not create, maintain, or distribute a forked version of the software.

But they “encourage” contributions!

When I heard about this I was, momentarily, excited. But of course it seems the company that got its hands on the product either don’t understand what they have or are incapable of properly handling it.

I’ve been using Winamp for decades now, which is a weird thing to say because I don’t consider myself old (yet). I still have Winamp 2.95 installed on my current PC and use it pretty much daily because nothing else, even today, comes close to the user experience.6 That’s not to say it’s perfect, but it fits my needs.

For reasons, today’s news inspired me to finally try Wacup. A fan project inspired by Winamp, built around a plugin system, designed to essentially take the best of Winamp and bring it fully into the 21st Century. With proper scaling and high-DPI support, built-in FLAC support, and more, it brings in lots of quality-of-life features without sacrificing any of the things that made Winamp special.

They can even look the same! Top: Wacup. Bottom: Winamp.

Wacup even comes with a faithfully-recreated Classic skin for codgers like me who like that aesthetic. I’m still tinkering with some of the font settings and might go for a different skin in the end, but I think as long as it stays stable and light on the memory footprint, it’s going to be here to stay. 10/10, highly recommend.

WebP Rules

This is a PNG.

This is a WebP.

One of these images is a PNG, and one is a WebP. The PNG is 12KB, and the WebP is 5KB. Do you see a difference between the two?

Me, neither.

I’ve been a steadfast PNG supporter since time immemorial. Cringing at any sight of JPEG compression. Way back in the way when forums were still big, and users spent inordinate amounts of time creating cool-looking signature images, I was big into PNGs. I prided myself on every aspect of my little 300×100 pixel PNG box. But mostly the crisp lines and vibrant colors.

When creating images for a site, I’d typically use (old-school) Photoshops ‘optimize for web’ feature for PNGs. This did a pretty decent job at compressing my PNGs to reasonable sizes.

But, nothing ever gave me such a boost as a WebP. Even for large images full of color, I can still drop a picture from 1.2MB to 680KB, while maintaining the same visible quality! That’s insane!

One of these days when I have more time and energy I’m going to read some more specs on how this works, but until then, I’m just going to go on living like it’s pure magic and my days will be a little brighter for it.

Mang JS

A long time ago I built a tool I called mang: Markov Name Generator. It went through a number of iterations, from a .Net class library, to WPF app, to a Vue app, even as part of a gamedev toolkit for a long abandoned roguelike. Most recently, it’s been a desktop app built with Avalonia so I could use it on Linux. Over the years it’s become my “default” project when learning a new language or framework. Well, now I have one of those new-fangled MacBook Pros. Anyone familiar with MacOS development knows how much of a hassle it is to build anything for it, even if you just want to run locally, and I do not want to pull over all my code and an IDE just to run in debug mode to use the app.What I did, instead, was port the whole thing over to a single static webpage.

I put off doing this for a long time because in my spare time I’m an expert procrastinator. Also just as important to note is my tendency to over-complicate, and I kept finding myself wanting to build a full API and web app, which is just completely unnecessary for a tool like this. But I do use mang quite a bit, and there’s nothing so complicated that it can’t be done in vanilla JS. So I bit the bullet and ported the name generation to start.

It’s more of a transliteration than a “true” port or rewrite. The code is almost exactly the same, line by line and function by function, as it is in C#. But the end result is pretty compact: the CSS itself, which is just a very-slightly-modified fork of simple.css, is almost larger than the entire index.html file. While there is plenty to whine about when it comes to JavaScript (a fault of the ecosystem more than the language), it is nice to have everything in such a plain and accessible format.

The entire tool is unminimized and all of the assets are free to browse.

And all in all, this whole thing went much smoother than I expected for less than an hour of work.

What Changed

As part of this process I removed some of the name types that can be generated. Most of the types available in mang are historical or fictional, and it felt odd to have some name types with contemporary sources. As such, all of the East Asia, Pacific Island, and Middle-East Sources have been removed.

What’s Coming

I have not ported the admittedly barebones character generation stuff yet. I have some better plans for that and will be spending some time fleshing that feature out.

The character generation so far has been “trait” flags, randomly selecting between things like “carefree” and “worried”, or picking a random MBTI type. It’s generally enough for rough sketched of an on-the-fly NPC or something, but could use some more work to be truly useful as a starting point for anyone requiring more detail.


Helion, a Theme

A Brief Rant About Color

I have a lot of opinions on colors and color schemes. For example, if you are implementing a dark mode on your app / site / utility and decide to go with white text on a black background (or close-to-white on close-to-black), you should be charged with a criminal offense.

High contrast is a good thing! Don’t get me wrong! But that type of dark mode is offensive to my eyes and if I find myself using something with that color scheme I will switch to light mode, if possible, or simply leave and never come back. It’s physically painful for me, leaving haloes in my vision and causing pretty harsh discomfort if I try to read for more than a few seconds. And though this may be contentious, I find it a mark of laziness: you’re telling me you couldn’t be bothered to put in even a little effort in finding a better way to do dark mode?

So it may come as no surprise that I am a long-time Solarized user. From my IDEs, to my Scrivener theme, to my Firefox theme, to anything I can theme — if it’s got a Solarized option, I’m using it. For a long time, even this blog used some Solarized colors. (Dracula is a close second in popularity for me.)

Helion: A VS Code Theme

I’ve long experimented with my own color schemes. It’s a perfect channel for procrastination, and a procrastinator’s work is never done. Today, I think I’ve settled on a good custom dark theme which I want to release, on which I’d like to iterate as I continue to use and tweak.

Helion, inspired by Solarized and colors that faded away, is my contribution to the dark theme world. It’s not perfect — few things are — but my hope is that it becomes a worthy entry to this world.

Right now it is only a Visual Studio Code theme. As I continue to use and tweak it, I plan to standardize the template into something I can export and use for other tools.

Here is a screenshot:

A screenshot of the Helion theme in use in Visual Studio Code, viewing two JSON files side by side.

Just comparing some json files

Now, I am not a usability expert. The colors here are in no way based on any scientific study and I do not assert that they are empirically perfect or better than any other dark mode theme. This is simply a theme which I’ve customized for my own tastes, according to colors and contrasts that are appealing to my own eyes

That said, any feedback is greatly appreciated. If anyone ever does choose to use the theme, I would be delighted to hear from you, whether it’s good or bad (or anywhere in between).

Enjoy!

The Querynomicon

I always felt that every well-rounded developer needs to build a strong working knowledge of SQL.

You don’t need to know the internals of the query engine of your chosen DBMS.7 You don’t need to master the arcane depths of proprietary SQL syntax. However, you should know the types of joins, how NULLs are dealth with, what a SARGable8 condition is and how to write it, and most importantly, how to think about your data in sets.

A long time ago I wrote about some of the tools and resources I used to learn SQL. I was also blessed to work at a company which put data first, where one developer had forgotten more about SQL and Windows internals than most people will ever learn. So I had access to a lot of tools and immersion in an environment that would be difficult for some to find. My unstructured approach to learning was not so different from total immersion plus comprehensive input, in the world of second language acquisition; that is, I would have had to try in order to not learn SQL.

Today I came across a new resource for learning SQL that would have been incredible back when I was still learning the ropes: The Querynomicon. 9

This site is not only a great resource, it is built wonderfully. Starting with a quick overview, then onto scope (intended audience, prerequisites, and learning outcomes), then straight onto learning peppered with practice exercises, then wrapping up with a concise yet comprehensive glossary, I’m just impressed by the level of quality here. From the structure to the presentation, it’s impeccably laid out, almost like a condensed textbook turned hypertext.10 You could turn this site into a class! Even the flowcharts, explaining high-level concepts, are masterfully done and read like natural language. I love it.

If you’re slightly familiar with SQL, this really is a great site to check out. If you’re brand-new and want to begin learning, maybe start with SQL Server Central’s Stairway to Data (or DML) series, and then the Querynomicon to reinforce what you’ve learned.

Shuffling

A while back I went through my iTunes library and took the songs/albums/artists not already living in their own playlist and put them into a mega-playlist I called all of it.11 This way I could just hit “shuffle my library” when in my car and emulate what I used to do with my iPod. I don’t want to fuss with any UI, or play some algorithmically-generated playlist full of suggested music I might like. I just want to listen to my library, on shuffle.

But I’ve been feeling lately like shuffling just… isn’t good. Maybe the same artist plays twice in a row, or within just a couple minutes of my past listen. Or each shuffle still front-loads the same small selection of songs, so I don’t really get to explore the depths of my library. It’s not a large library (10,000 songs), but not small either, so why am I hearing the same old stuff? This is the 21st Century, is it really so hard to shuffle music?

The short answer is: no, it’s not. We can pretty much shuffle any list of things in an almost truly random fashion. But there are plenty of reasons shuffling doesn’t seem random. Humans are inherently pattern-seeking animals12, and in a truly random sequence of songs, a given artist is no more or less likely to play next based on the previous artist. So you could have two artists play in a row—or if it’s truly random, the same song!

Another problem is that once software gets Good Enough™ it doesn’t usually get touched again until there are actual problems with it—or there is a strong monetary incentive to do so. So a developer with the task to write a shuffle feature might do what’s Good Enough™ to close the ticket according to the requirements and test plan13, then move onto the next ticket.

So what does it really take to do a good job shuffling a playlist? I wanted to do a little experimenting so I thought I would start from the ground up and walk through a few different methods. First, I need…

The Music

I took my playlist and ran it through TuneMyMusic to get a CSV. Then I wrote a little bit of code to parse that CSV into a list of Song objects, which would be my test subject for all the shuffling to come.

True Random

First I wanted to see how poorly, or how well, a “true” random shuffle worked.14 This is easy enough.

We’ll just do a simple loop. For the length of the song list, grab a random song from the list and return it:

for (var i = 0; i < songList.Count - 1; i++) 
{
    var index = RandomNumber.Next(0, songList.Count - 1); 
    yield return songList[index]; 
}

And right away I can see that the results are not good enough. Lots of song clustering, even repeats: one song right after the other!

[72]: Goldfinger - Superman 
[73]: Metallica - Of Wolf And Man (Live with the SFSO) 
[74]: Orville Peck - Bronco 
[75]: Def Leppard - Hysteria 
[76]: Kacey Musgraves - Follow Your Arrow 
[77]: Kacey Musgraves - Follow Your Arrow 
[78]: The Thermals - My Heart Went Cold 
[79]: Miriam Vaga - Letting Go

A Better Random

After looking at the results from the first random shuffle, two requirements have become clear:

  1. The input list must have its items rearranged in a random sequence.
  2. Each item from the input list can only appear once in the output list.

Kind of like shuffling a deck of cards. I’d be surprised to see a Deuce of Spades twice after shuffling a deck. I’d also be surprised to see the same Kacey Musgraves song in my queue twice after hitting the shuffle button.

Luckily this is a problem that has been quite handily solved for quite a long time. In fact, it’s probably used as the “default” shuffling algorithm for most of the big streaming players. It’s called the Fisher-Yates shuffle and can be accomplished in just a couple lines of code.

for (var i = shuffledItemList.Length - 1; i >= 0; i--) 
{ 
    var j = RandomNumber.Next(0, i); 
    (shuffledItemList[i], shuffledItemList[j]) = (shuffledItemList[j], shuffledItemList[i]);
}

Starting from the end of the list, you swap an element with a random other element in that list. Here I’m using a tuple to do that “in place” without the use of a temporary variable.

The results are much better, and at first glance almost perfect! But scanning down the list of the first 100 items, I do see one problem:

[87]: CeeLo Green - It's OK 
[88]: Metallica - Hero Of The Day (Live with the SFSO) 
[89]: Metallica - Enter Sandman (Remastered) 
[90]: Guns N' Roses - Yesterdays

It’s not the same song, but it is the same artist, and in a large playlist with lots of variety, I don’t really like this.

Radio Rules

Now I know I don’t want the same song to repeat, or the same artist, either. While we’re at it, let’s say no repeating albums, too. So that’s two new requirements:

  1. No more than x different songs from the same album in a row.15
  2. No more than y different songs from the same artist in a row.

This is pretty similar to the DMCA restrictions set forth for livestreaming / internet radio. It will also help guarantee a better spread of music in the shuffled output.

Here’s some code to do just that:

var shuffledItemList = ItemList.ToArray();
var lastPickedItems = new Queue<T>();
for (var i = shuffledItemList.Length - 1; i >= 0; i--)
{
    var j = RandomNumber.Next(0, i);

    var retryCount = 0;
    while (!IsValidPick(shuffledItemList[j], lastPickedItems) &&
           retryCount < MaxRetryCount)
    {
        retryCount++;
        j = RandomNumber.Next(0, i);
    }

    if (retryCount >= MaxRetryCount)
    {
        // short-circuiting; we maxed out our attempts
        // so increment the counter and move on with life
        invalidCount++;

        if (invalidCount >= MaxInvalidCount)
        {
            checkValidPick = false;
        }
    }
    
    // a winner has been chosen!
    // trim the stack so it doesn't get too long
    while (lastPickedItems.Count >= Math.Max(ConsecutiveAlbumMatchCount, ConsecutiveArtistMatchCount))
    {
        _ = lastPickedItems.TryDequeue(out _);
    }
    
    // then push our choice onto the stack
    lastPickedItems.Enqueue(shuffledItemList[j]);
    
    (shuffledItemList[i], shuffledItemList[j]) = (shuffledItemList[j], shuffledItemList[i]);
}
return shuffledItemList;

This, at its core, is the same shuffling algorithm, with a few extra steps.

First, we introduce a Queue, which is a First-In-First-Out collection, to hold the x most recently chosen songs.

Then when it’s time to choose a song, we look in our queue to determine if any of the recent songs match our criteria. If they do, then we skip this song and choose another random song. We attempt this only so many times. While the chance is low, there’s still a small chance that we could get stuck in a loop. So there’s a short-circuit built in that will tell the loop it’s done enough work and it’s time to move on.

In addition to that, there’s a flag with a wider scope: if we’ve short-circuited too frequently, then the function that checks for duplicates will stop checking.

This is an extra “just in case”, because if I hand over a playlist that’s just a single album or artist, I don’t want to do this check every single time I pick a new song. At one point it will become clear that it isn’t that kind of playlist.

Once a song has been chosen, the lastPickedItems Queue gets its oldest item dequeued and thrown to the wind16, and the newest item is enqueued.

How does this do? Pretty well, I think.

[89]: Metallica - One (Remastered) 
[90]: Stone Temple Pilots - Plush 
[91]: Megadeth - Shadow of Deth 
[92]: System of a Down - Sad Statue 
[93]: Jewel - Don't 
[94]: Def Leppard - Love Bites 
[95]: Elton John - 16th Century Man (From "The Road To El Dorado" Soundtrack) 
[96]: Daft Punk - Aerodynamic 
[97]: Kacey Musgraves - High Horse 
[98]: Above & Beyond - You Got To Go 
[99]: Gnarls Barkley - Just a Thought

But not all playlists are a wide distribution of artists and genres. Sometimes you have a playlist that is, for example, a collection of 80s rock that’s just a bunch of Best Of compilations thrown together. How does this algorithm fare against a collection like that?

Answer: not well.

[0]: CAKE - Meanwhile, Rick James... 
[1]: CAKE - Tougher Than It Is (Album Version) 
[2]: Breaking Benjamin - I Will Not Bow 
[3]: Enigma - Gravity Of Love 
[4]: CAKE - Thrills
[5]: Clutch - Our Lady of Electric Light

It immediately short-circuits, and we see lots of clustering. Maybe not a deal-breaker for a smaller, more focused playlist, but I can’t help but feel there’s a better way to handle this.

Merge Shuffle

Going back to the deck of cards analogy: there are only 4 “albums” in a deck, but shuffling still produces good enough results for countless gamblers and gamers. So why not try the same approach here?

We want to split our list into n elements, then merge them back together, with a bit of randomness. Like cutting and riffling a deck of cards.

First, we’ll do the easy part: split up the list into a list of lists – like a bunch of hands of cards.

private List<List<T>> SplitList(IEnumerable<T> itemList, int splitCount)
{
    var items = itemList.ToArray();
    return items.Chunk(items.Length / splitCount).Select(songs => songs.ToList()).ToList();
}

Then, we pass this list to a function that will do the real work of shuffling and merging.

private IEnumerable<T> MergeLists(List<List<T>> lists)
{
    var enumerable = lists.ToList();
    var totalCount = enumerable.First().Count;
    var minCount = enumerable.Last().Count;

    var difference = totalCount - minCount;
    var lastList = enumerable.Last();

    lastList.AddRange(Enumerable.Repeat((T)dummySong, difference));
    
    // set result
    var resultList = new List<T>();
    var slice = new Song[enumerable.Count];

    for (var i = 0; i < totalCount - 1; i++)
    {
        for (var l = 0; l <= enumerable.Count - 1; l++)
        {
            slice[l] = enumerable[l][i];
        }

        for (var j = slice.Length - 1; j >= 0; j--)
        {
            var x = RandomNumber.Next(0, j);

            (slice[x], slice[j]) = (slice[j], slice[x]);
        }

        for (var j = 1; j <= slice.Length - 1; j++)
        {
            if (slice[j - 1] == dummySong)
            {
                continue;
            }
            
            if (slice[j].ArtistName == slice[j - 1].ArtistName)
            {
                (slice[j - 1], slice[slice.Length - 1]) = (slice[slice.Length - 1], slice[j - 1]);
            }
        }

        if (i > 0)
        {
            var retryCount = 0;
            while (!IsValidPick(slice[0], resultList.TakeLast(1)) &&
                   retryCount < MaxRetryCount)
            {
                (slice[0], slice[enumerable.Count - 1]) = (slice[enumerable.Count - 1], slice[0]);
                retryCount++;
            }
        }
        
        resultList.AddRange((IEnumerable<T>)slice.Where(s => s != dummySong).ToList());
    }
    
    return resultList;
}

This is kind of a big boy. Let’s go through it step by step.

First, we cast our input list to a local variable. I am allergic to side-effects, so I want any changes (destructive or otherwise) confined to a local scope inside this function to keep it as pure as possible.

We’ll take the local list, and then find the length of the biggest chunk, and the length of the smallest chunk. There will only be one chunk smaller than the rest. We’ll fill it up with dummy songs so it’s the same length as the other chunks, and then disregard the dummy songs later.17

Once our lists are in order, we slice through them one section at a time. The slice gets shuffled18, then checked for our earlier-defined rules, but a little more relaxed: no artist or album twice in a row. If a song breaks a rule, we just move it to the end of the array and try again, always with a short-circuit so we don’t get caught in an endless loop.

And of course, we will always allow / ignore the dummy songs, so they don’t interfere with any real choice.

But, there’s a problem! Like a deck of cards, shuffling once just isn’t enough. Like you do with a real deck of cards, let’s go through this process at least seven times.

for (var i = 0; i <= ShuffleCount - 1; i++)
{
    var splitLists = SplitList(list.ToList(), SplitCount);
    list = MergeLists(splitLists);
}

And… the output looks really good, in my opinion!

[0]: Daft Punk - Digital Love 
[1]: America - Sister Golden Hair 
[2]: CAKE - Walk On By 
[3]: Guns N' Roses - Yesterdays 
[4]: Hey Ocean! - Be My Baby (Bonus Track) 
[5]: CAKE - Meanwhile, Rick James...
[6]: Fitz and The Tantrums - Breakin' the Chains of Love 
[7]: Digitalism - Battlecry 
[8]: Harvey Danger - Flagpole Sitta 
[9]: Guttermouth - I'm Destroying The World

However, this only really works well in the areas where the plain-old Fisher-Yates shuffle doesn’t. When used on smaller or more homogeneous sets, the results still leave something to be desired. These two shuffle methods complement each other, but cannot replace each other.

Shuffle Factory

So what happens now?

I thought about checking the entire playlist beforehand to see which algorithm should be used. But there’s no one-size-fits-all solution for this. Because, like my iTunes library, there could be a playlist with a huge number of albums, and also a huge number of completely unrelated singles.

So let’s get crazy and use both.

First we need to determine the boundary between the Fisher-Yates shuffle and the “Merge” Shuffle (for lack of a better term). I’m going to just use my instincts here instead of any hard analysis and say: if it’s a really small playlist, or if more than x percent of the playlist is one artist, then we’ll use the Merge Shuffle.

private ISortableList<T> GetSortType(IEnumerable<T> itemList)
{
    var items = itemList.ToList();
    if (HasLargeGroupings(items))
    {
        return new MergedShuffleList<T>(items);
    }
    return new FisherYatesList<T>(items);
}

public bool HasLargeGroupings(IEnumerable<T> itemList)
{
    var items = itemList.ToList();
    if (items.Count <= 10)
    {
        // item is essentially a single group (or album)
        // no point in calculating.
        return true;
    }

    var groups = items.GroupBy(s => s.ArtistName)
        .Select(s => s.ToList())
        .ToList();

    var biggestGroupItemCount = groups.Max(s => s.Count);

    var percentage = (double)biggestGroupItemCount / items.Count * 100;

    return percentage >= 15;
}

Pretty straightforward! There is also the function that performs the check, and returns a new shuffler accordingly.

Now let’s shuffle.

public void ShuffleLongList(IEnumerable<T> itemList,
    int itemChunkSize = 100)
{
    var items = itemList.ToList();
    if (items.Count <= itemChunkSize)
    {
        chunkedShuffledLists.Add(GetSortType(items).Sort().ToList());
        return;
    }

    items = new FisherYatesList<T>(items).Sort().ToList();
    
    // split into chunks
    var chunks = items.Chunk(itemChunkSize).ToArray();

    // shuffle the chunks
    var shuffledChunks = new FisherYatesList<T[]>(chunks).Sort();

    foreach (var chunk in shuffledChunks)
    {
        chunkedShuffledLists.Add(GetSortType(chunk).Sort().ToList());
    }
}

Again, pretty simple.

Split our input into x lists of y chunk size (here, defalt to 100). Again we’ll do a little short-circuiting and say if the input is smaller than the chunk size, we’ll just figure out the shuffle type right away and exit immediately.

Otherwise, we do a simple shuffle of the input list and then split it into chunks of the desired size. I chose to do this preliminary shuffle as an extra degree of randomness. I hate hitting shuffle on a playlist, playing it, then coming back and shuffling again and getting the same songs at the start.19 So this will be an extra measure to guarantee the start sequence is different every time.

Next we shuffle the chunk ordering. Again, using Fisher-Yates, and again, for improved starting randomness.

After that we just iterate through the chunks and shuffle them according to whichever algorithm performs better for that particular chunk.

The output here is, again, really nice in my testing. I ran through and checked multiple chunks and felt overall very pleased with myself, if I’m being honest.

[0]: Rina Sawayama - Chosen Family 
[1]: Clutch - Our Lady of Electric Light 
[2]: Matchbox Twenty - Cold 
[3]: Rocco DeLuca and The Burden - Bus Ride 
[4]: Rob Thomas - Ever the Same 
[5]: MIKA - Love Today 
[6]: CeeLo Green - Satisfied 
[7]: Metallica - For Whom The Bell Tolls (Remastered) 
[8]: Elton John - I'm Still Standing 
[9]: Rush - Closer To The Heart 
... 
[0]: Wax Fang - Avant Guardian Angel Dust 
[1]: Journey - I'll Be Alright Without You
[2]: Linkin Park - High Voltage 
[3]: TOOL - Schism 
[4]: Daft Punk - Giorgio by Moroder 
[5]: Fitz and The Tantrums - L.O.V. 
[6]: Stone Temple Pilots - Vasoline (2019 Remaster) 
[7]: Jewel - You Were Meant For Me 
[8]: Butthole Surfers - Pepper 
[9]: Collective Soul - No More No Less

Outtro

I don’t think there’s any farther I can take this. I know if I looked closer and the end results, I could find something else to change. There’s a whole world of shuffling algorithms out there, and plenty to learn from. If I felt so inclined I could write something to shuffle my playlists for me, but this exercise was really to learn first why shuffling never seemed good enough, and second if I could do better.

(The answers, as usual, were “it’s complicated” and “maybe”.)

Further Reading

  • The source code for my work is over at my github.
  • Spotify, once upon a time, did some work on their shuffling and wrote about it here
  • Live365 has a brief blog post on shuffling and DMCA requirements here