April 29, 2022

How to Play Wordle like a Data Scientist

Recently, we’ve noticed more and more Wordle games facing fates like these:

One Monday, our Founding Partner Derek let slip that, over the weekend, he’d programmed a Wordle solver which would avoid such a fate. Over subsequent weeks, our team ended up running some games against it. Here’s what resulted…

What is Wordle?

For those who’ve managed to sidestep the viral Internet phenomenon: the goal is to guess a secret word. The player gets six guesses—the fewer, the better. Once the player submits a guess, the letters change color to provide feedback: green (the letter is in the word, and in the correct spot), yellow (the letter is in the word, but in another spot), or gray (the letter is not in the word at all).

Gameplay: human vs. bot

On average, our team wins in ~4.5 guesses, while Derek’s bot wins in ~3.5 guesses. Let’s see how our strategies differ.

Starting off, our team guesses FLIER and SHARE. For comparison’s sake, we force the bot to match our moves.

We decide that SCARE makes a sensible third guess. Derek runs his bot, and ~200 milliseconds later, the code spits out…SCENT. Why SCENT, of all words? After all, it’s guaranteed to be wrong.

To understand the bot’s reasoning, take a look at the possible outcomes of the two guesses:

At this point, the only unknown is the second letter (S_ARE), which must be C, P, T, or N (SCARE, SPARE, STARE, or SNARE). The bot chooses a word to eliminate as many of those potential letters as possible: SCENT (C, P, T, or N). Meanwhile, our team picks SCARE arbitrarily. Yes, we win instantly if C is the correct second letter—but there’s a 75% chance it isn’t, in which case, we don’t gain much information for our future guesses (C, P, T, or N).

Sure enough, here’s what ensues:

Guessing SCARE turned Wordle into a game of Russian Roulette: we played at the whim of luck. Meanwhile, the bot guessed SCENT and ensured itself a two-move victory—clearly the better strategy.

How does the bot “think”?

The bot plays using a “divide and conquer” strategy. In technical terms, it “scores” all of its possible guesses with a metric called information entropy, which describes how a current guess divides the set of potential future guesses. The goal is to maximize information entropy: divide into more groups, and of similar sizes, in order to eliminate the highest number of potential future guesses.

SCARE divided our “solution space” into only two uneven (“asymmetric”) groups, while SCENT divided the bot’s solution space into four equally sized (“uniform”) groups. SCENT has the larger information entropy by far, making it the better guess.

What are real-world applications?

Today, the search feature in computer and web applications uses the “divide and conquer” strategy to deliver fast results. Consider Google Search. Imagine waiting minutes, hours… days… every time you click “Submit”…on every website you use.

An algorithm like our original Wordle strategy (blue line) gets slower in proportion to the number of items it has to sift through. Meanwhile, “divide and conquer” algorithms (pink line) stay efficient, even as the search space expands by orders of magnitude.

In fact, we use “divide and conquer” algorithms in our day-to-day all the time. Let’s say you misplace your keys and are searching for them in your house. Instead of painstakingly investigating the rooms in order, from the front door to the back, you might first brainstorm all the likely places your keys could be and narrow the search. You’d then ask yourself, “Of those places, where have I been most recently?” and begin there.

All this to say: try the “divide and conquer” strategy on your next Wordle game. We’ve been using it, and we maintain high hopes of never facing a 6-guess demise again.