I’ve been playing too much Scrabble Go recently, the new version of Scrabulous. The app is full ads & annoyances, but I like playing with my friends, so I decided to build my own personal version to play with friends.

I also wanted to build a training mode into the app, and also later on try my hand at a Scrabble AI. Apparently, Scrabble is unsolved. Quackle, the best open source Scrabble AI is “competitive” with tournament players, but hasn’t dominated like other AIs have in Chess and Go, presumably because the randomness and imperfect information make it more difficult to reason about the game. So I thought it might be fun to at least try to compete with Quackle.

I’m far from done, but I’m trying to write more “bite-sized”, iterative blog posts, so here I’ll go into

  • The stack (Rust+WASM)
  • The types
  • The data structures

The Stack

The plan is to write all the scrabble logic in Rust, so it can be compiled to WASM and used on the frontend to play the game, and also on the backend to run validation & scoring, plus later an AI.

Frontend

I re-used the structure of paint-by-wasm, which was a wasm-pack application which used a slightly modified version of this template to run the WASM in a Web Worker.

On the frontend I got started quickly with some React (but not attached to that) and used CSS Grid to draw the board.

I used the pseudo-element padding trick, which is adding an :after pseudo element with padding-bottom: 100% to the grid squares, and then was able to have a 15x15 grid of responsive squares.

I then used vanilla HTML&JS to make drag&droppable tiles by setting the required onDrag, onDrop, etc events.

It looks like this:

Backend

I decided to just get started and make some types, and then come back later to fix issues as they come up. In strongly-typed languages it’s often seductive to get the types perfect up-front, but I like being pragmatic and exploring the solution as I go. It’s pretty easy to go back and change things later, especially in a program with few types like this one.

type Letter = char;

struct BoardIdx(usize, usize);

struct TilePlacement {
    letter: Letter,
    idx: BoardIdx,
}

enum Direction {
    Up,
    Down,
    Left,
    Right,
}

enum BonusTile {
    Normal,
    DoubleWord,
    DoubleLetter,
    TripleLetter,
    TripleWord,
}

struct SinglePlay {
    placements: Vec<TilePlacement>,
}

struct BoardState {
    tiles: HashMap<BoardIdx, Letter>,
}

The Letter type

There are 2 notable things about using char for the letter type:

  1. Tiles should be Unicode

Scrabble is a multilingual game, and I’d like to leave the possibility of having different languages open. Not all languages use ASCII tiles (recall ASCII standards for American Standard Code for Information Interchange), some of them have accented characters like Ä, Æ, or have their own alphabet like Arabic. Rust chars are a good fit here, as they can be multi-byte Unicode values.

There is one caveat that there are some Scrabble variants, like Spanish, have two-letter tiles like LL or QU. I’ve decided to punt on this one, I don’t think it’ll be too hard to come back to fix later.

  1. Blank tiles

Scrabble has the concept of “blank tiles”, which you can use as any letter, but which count for zero points. The tile is blank in the player’s rack, then played as a particular letter in a word, but its letter score is zero. One way to do this would be to have a flag on Letter to record if it was a blank tile, and then have logic in different places look at that flag, but there is a nice solution that already exists in Scrabble notation:

In written down Scrabble notation, normal letters are written uppercase, and letters that came from blank tiles are lower case , like VALIaNT, or kETCHUP. This is neat because you don’t need an extra flag, you can simply give a 0 score to lowercase letters, and use toUpper before checking the dictionary.

Board Indices

I use two zero-indexed integers in the traditional 2d array [row][column] orientation to denote Board indices. While Scrabble indices are usually written like chess (eg ‘C5’), apparently some languages/variants of scrabble swap those so that C5 would be E3 in German. I decided that using pairs of usizes then would be fine, and easier to not have to translate all the time.

Defining the Board & Dictionary

I got started here hard-coding the english alphabet and standard board in, with lazy_static:

lazy_static! {
    static ref BOARD: HashMap<BoardIdx, BonusTile> = [
        (BoardIdx(0, 0), BonusTile::TripleWord),
        (BoardIdx(0, 3), BonusTile::DoubleLetter),
        (BoardIdx(0, 7), BonusTile::TripleWord),
        (BoardIdx(0, 11), BonusTile::DoubleLetter),
        (BoardIdx(0, 14), BonusTile::TripleWord),
    ]
    .iter()
    .copied()
    .collect();
    
    static ref LETTER_SCORES: HashMap<char, u32> = [
        ('A', 1),
        ('B', 3),
        ('C', 3),
        ('D', 2),
        ....

The scoring function is as far as I’ll get into this blog post. Scoring a single word looks like:

fn score_word(word: &[TilePlacement]) -> u32 {
    let mut score = 0;
    let mut total_word_multiplier = 1;

    for placement in word {
        let TilePlacement { letter, idx } = placement;
        let letter_score = LETTER_SCORES.get(letter).unwrap_or(&0);

        let (word_multiplier, letter_multiplier) = BOARD.get(idx).unwrap_or(&BonusTile::Normal).get_mul();
        total_word_multiplier *= word_multiplier;
        score += letter_score * letter_multiplier;
    }
    score *= total_word_multiplier;
    return score;
}

I smoke tested it with some sample words, including the longest possible known word OXYPHENBUTAZONE, (1458 points! wowza!).

I haven’t yet written code to score a 2-dimensional, multi-word play but that’ll be next.

Other fun parts: Accepting a word

Determining if a word is valid dictionary word is a fun Computer Science 101 data structure question, which most people remember the solution as the Prefix Trie:

A prefix trie that accepts several different words

The “late stage undergrad” solution is to build a Deterministic Finite Automaton, which is a kind of state machine. Instead of just keeping prefixes, suffixes are also shared between words. For example, all words ending in ing in english would end up pointing at the same nodes.

A DFA that accepts the same words as the prefix trie, but using fewer nodes

The reason it’s “later undergrad” is that it’s a bit more work to construct the minimal state machine, but here we have prior work in a cool Rust crate that does this called fst. I recommend reading the blog post that explains how it works. It seems extremely well optimized..

I found a text version of the Collins dictionary, with ~280k words, one word per line, which weighed 2.8MB. fst compressed it to 800k with this script:

fst set --sorted collins.txt

The goal later will be to embed the fst state machine into the Rust & wasm binaries for easy distribution. 2.8MB-> 0.8MB isn’t that big of a deal for a toy project like this, and putting 280k words in a big Set would probably be fine performance (until I got to the AI part), but I was looking for a reason to try fst out and that was fun.

Other fun parts: Scrabble notation

Scrabble has an already accepted format for digital scrabble games, called GCG. Essentially all of this has already been done, but I’m really enjoying toying with it myself. I’ll probably try read GCG files in teh future.

I’ll leave this as a teaser for the next post, but Scrabble AIs have an even better data structure (a twist on the DFA), which I’ll explain in the next post.

I’ve also found this well written Rust Scrabble program, which I’ll be looking into.

My code is currently split into two repos, and not public yet, because I’ve had very little time to work on this. I’m trying to enjoy every off-work minute I can and go outdoors before the pandemic & the darkness force me back inside!


comments powered by Disqus