A database of all chess positions in your own computer

2021-08-18 13:00

Note: This is based on an old (2018) Quora answer.

If I asked you "How much hard drive space would be required for a Database representing every possible position in chess?" what would you answer?

Well, most answers go like this:

"Claude Shannon estimated there are 10⁴³ positions in chess, which can be stored in ~32 bytes each, so it's something like 10⁴⁴ bytes, and the observable universe has only 10⁸⁰ atoms so it's something pretty large!"

To that I say phooey Shannon! It takes around 700 bytes if you want the fancy version.

Let's start by defining what the requested artifact is.

"A database representing every possible position in chess"

So, what is such a database's behaviour? I propose:

It can, if given some sort of index value, return a specific position (which will always be the same)
It can iterate over all chess positions.
It should be able to return all positions matching a specific criteria like 'has a white pawn in D5'

My proposed implementation will do the first 2 which I consider actual requirements, the 3rd being just a "nice to have". While it's not impossible to add, even my willingness to do stupid things has a limit.

So, limiting myself to the first two requirements, if I fulfill those then it's done, right? Because if the deliverable is correct, the rest is implementation details?

Let's implement it!

Step 1: notation

Welcome to Forsyth-Edwards notation (FEN for short). It's a wonderful thing that provides all the necessary information to restart a game.

I will use a hacked subset since all I want is the position (not things like "has white castled?" or "who is moving next?") but it's a trivial exercise to expand this database to do just that.

So, how does this database describe a position? A position is a list of pieces and their positions in a 8x8 board.

In the original FEN white pawn is identified as P and black pawn as p. I consider that an insult to the IETF who has adopted UTF-8 in RFC2777 (and also slightly racist), therefore I will use the proper glyphs: ♙ and ♟.

In fact, I will use ♙♘♗♖♕♔♟♞♝♜♛♚.

As for positions, since my database is small enough that data compression is pointless, let's just use a 64-character fixed-size string where each position is either a piece of a space meaning the square is empty.

So, a position is a string of length 64 where these are the only valid values: "♙♘♗♖♕♔♟♞♝♜♛♚ "

Step 2: implementation

It's obvious that each position is equivalent to a 64-digit number in base 13, which means there are 196 053 476 430 761 073 330 659 760 423 566 015 424 403 280 004 115 787 589 590 963 842 248 960 possible positions.

That 64-digit number is the index key for each position (it's an optimal key, it can't be made any smaller!)

So here's the database core code, offered with no comments since it's barely 12 lines of code doing nothing weird:

def get_position(index):
    def digit_to_char(digit):
        return "♙♘♗♖♕♔♟♞♝♜♛♚ "[digit]

    def str_base(number, base=13):
        (d, m) = divmod(number, base)
        if d:
            result = str_base(d, base) + digit_to_char(m)
        else:
            result = digit_to_char(m)
        return result

    position = str_base(index).rjust(64)
    return position

For convenience, here is a pretty printer for your boards:

def print_board(position):
    print(" ABCDEFGH ")
    for i in range(8):
        print("%d%s%d" % (i, position[8 * (i) : 8 * (i + 1)], i))
    print(" ABCDEFGH ")

Here you can see it in action (sadly asciinema butchers the alignment, it works properly in real life):

And here is the full source code:

def get_position(index):
    def digit_to_char(digit):
        return "♙♘♗♖♕♔♟♞♝♜♛♚ "[digit]

    def str_base(number, base=13):
        (d, m) = divmod(number, base)
        if d:
            result = str_base(d, base) + digit_to_char(m)
        else:
            result = digit_to_char(m)
        return result

    position = str_base(index).rjust(64)
    return position


def print_board(position):
    print(" ABCDEFGH ")
    for i in range(8):
        print("%d%s%d" % (i, position[8 * (i) : 8 * (i + 1)], i))
    print(" ABCDEFGH ")


if __name__ == "__main__":
    import sys

    print_board(get_position(int(sys.argv[1])))

Please notice that this database also covers all alternative chess variants where extra pieces are given or removed as handicap.

Step 3

There is no step 3.

Ralsina.Me — Roberto Alsina's website

Step 1: notation

Step 2: implementation

Step 3