I should have been more careful.

Last week, I put up a video about solving the game Wordle using information theory, and I wanted to add a quick addendum. It turns out there was a very slight bug in the code that I was running, which had to do with how you assign a color to a guess that has multiple different letters in it. This bug had a very slight effect, but it was easy to miss.

The bug affected the conclusion around what the theoretically optimal first guess is for the particular Wordle answer list. In the video, I said the best performance that I could find came from opening with the word “crane”, but after fixing the bug and rerunning it all, there is a different answer.

I understand that the point of the video is not to find some technically optimal answer to some random online game, but to sneak attack people with an information theory lesson. However, I should have been more careful. I walked into that because I put it in the thumbnail, but you can forgive me if I want to make a small correction here. Additionally, I never really discussed what went into the final analysis, which is an interesting sub-lesson in its own right.

To recap, we spent most of the last video trying to write an algorithm to solve Wordle without using the official list of answers. This felt like overfitting to a test set, so we looked at relative word frequencies in the English language to come up with a notion of how likely each one would be included as a final answer.

However, this time, we are incorporating the official list and shamelessly overfitting to the test set. We know with certainty whether a word is included or not, and can assign a uniform probability to each one.

The first step was to determine how likely it was that each of the possible patterns would be seen for a particular opening guess. To quantify the amount of information that would be gained from this guess, we went through each bucket and calculated how many times it would cut the space of possibilities in half.

We then searched through all 13,000 words to find the one with the highest expected information - which turned out to be “soare” - and the top 15 openers by this metric.

Next, we did an exhaustive search two steps in. For example, if we opened with “soare” and the pattern we saw was “all greys’s”, we then ran an identical analysis for a proposed second guess, such as “kitty”. We measured the flatness of that distribution using the expected information formula, and did this for all 13,000 possible words.

Doing this, we found the optimal second guess in that scenario and the amount of information we expected to get from it. We repeated this process for all the different possible patterns, giving us a full map of the best possible second guesses and the expected information of each.

Taking a weighted average of all the second step values, weighted according to how likely it was to fall into that bucket, gave us a measure of how much information we were likely to gain from the guess “soare” after the second step. When we use this two-step metric as our new means of ranking, the list gets shaken up a bit. Soare falls back to 14th place, and slane rises to the top. This doesn’t feel very real, as it looks like it is a British term for a spade used for cutting turf. After these two steps, salet is the one that ends up with the best possible score, although it is an alternate spelling for a light medieval helmet. Trace and Crate give almost identical performance and are actual Wordle answers.

The move from sorting based on the best two-step entropy to sorting based on the lowest average score also shakes up the list, but not nearly as much. For example, Salet was previously third place before it bubbles to the top, and Crate and Trace were both fourth and fifth.

Stepping back from all this, it may seem like it ruins the game to overanalyze it and try to find an optimal opening guess. However, it is not necessarily the best opener for a human playing the game. We don’t have the word list memorized, and we get intuition from things like “what are the vowels” and “how are they placed”? The point of writing algorithms for this is not to affect the way we play the game, but to hone our muscles for writing algorithms in more meaningful contexts elsewhere.