The tree of useless optimization yields questionable fruit.
For the last two years, I’ve been building a chess engine called Fiddler. It’s not very good by chess engine standards, but that’s not the point - I mostly work on it for funsies.
Right now, I’m reworking my move generator to more heavily use static, precomputed data rather than runtime generation. Most recently, I’m moving from dynamically generating and heap-allocating the magic move generation table to making it a statically allocated constant. Along the way, I’ll take this opportunity to explain how magic move generation works, and hopefully have a little fun.
Update 2023-06-22: I’ve added a few updates to handle questions that people in the comments on the Orange Website found confusing.
A crash course in magic moves
Chess engines are usually extremely high-performance pieces of software, and their move generators are some of the most high-performance parts of them. Most move generators can generate hundreds of millions of moves, per core, per second. Mine included!
The majority of moves are made by sliding pieces - the bishop, rook, and queen - so we’ll dedicate most of our effort toward making that fast. The most naive approach would be to represent a board as an array of pieces, and then, for each piece, iterate outward along a ray, adding moves until we encounter a blocker. The problem with that approach is simply that it’s far too slow. In the worst case, finding the set of legal moves for a queen on E4 would require 27 different array accesses - and that’s just the moves for one piece! We’re going to need a different approach.
Bitboards
We can start by representing the occupancy of a board using a bitboard: a set of squares represented by a unique 64-bit integer. Convention dictates that the least-significant bit should be set if the square A1 is occupied, the second-least-significant to represent B1, and so on until the most significant bit which represents H8.
Here’s a quick map displaying which bit in a bitboards corresponds to a square, indexed from the LSB:
8 | 56 57 58 59 60 61 62 63
7 | 48 49 50 51 52 53 54 55
6 | 40 41 42 43 44 45 46 47
5 | 32 33 34 35 36 37 38 39
4 | 24 25 26 27 28 29 30 31
3 | 16 17 18 19 20 21 22 23
2 | 8 9 10 11 12 13 14 15
1 | 0 1 2 3 4 5 6 7
--+------------------------
| A B C D E F G H
For example, consider a board with a piece on B1, G3, and C7. B1 has
index 1, G3 has index 22, and C7 has index 50, so the bitboard uniquely
representing {B1, G3, C7} is , which is equal
to 1125899911036930.
We can use our bitboards to compute setwise operations in time.
Here’s a short (but not exhaustive) list of operations we can do:
Mathematical representation Bitboard operation
a | b
a & b
a ^ b
!a
Update 2023-06-22: I use Rust’s notation for !a
. In C and C++, one
would use ~a
for the bitwise not operation.
Building a lookup table
In order to get an computation of sliding moves, we can use a
lookup table to find precomputed values of the set of legal moves given
the starting square and occupancy bitboard of a board. There’s just one
problem: There are an enormous amount of possible occupancy maps,
roughly
possible
occupancies containing at least two kings and a sliding piece, or about
700 trillion total boards. To store an 8-byte bitboard for every square
and occupancy would require roughly 365 PB of data!
We can use a nice trick to reduce our data consumption, though. In our naive algorithm for sliding move generation, we only needed to examine whether the squares that we wanted our piece to move to were occupied - other squares not on the same ray were irrelevant. We can do the same: we can somehow extract out the relevant occupancies for any square’s move generation.
For example, these are the only relevant squares for the set of legal moves for a rook on E4:
8 | . . . . . . . .
7 | . . . . x . . .
6 | . . . . x . . .
5 | . . . . x . . .
4 | . x x x . x x .
3 | . . . . x . . .
2 | . . . . x . . .
1 | . . . . . . . .
--+------------------------
| A B C D E F G H
Update 2023-06-22: We can safely ignore those squares on the edge of the board because they actually don’t affect what our rook can “see.” For instance, the rook will be able to to see H4 if F4 and G4 are empty, but whether H4 is empty does not affect whether the rook can see H4. Once we have the set of squares the rook can see, we can convert that to the set of squares the rook can move to by masking out all pieces of the same color as the rook.
For each square, then, we only need to consider a small handful of
occupied spots. All we need to do is store a lookup for every
(square, relevant_occupancy)
pair, which is far smaller than the
previous requirement.
There are at most 9 relevant occupancy bits for a bishop and 12 bits for a rook, so the total memory requirement for this lookup table is less than 2.4 MB. If we break out the single master lookup table into a unique lookup table for each attacker type and starter square, we can get a smaller total memory consumption at only 861 kB.
Where the magic happens
However, we still have to be able to convert an occupancy bitboard into
an index. What we really want to do is to take our masked out bits and
extract them into an index. On x86 architectures, the pext
instruction
is exactly designed to do this, but if we want it to work on any
architecture, we’ll have to be a little more clever than that.
I’ll start with a magic example, and we’ll see if that can enlighten us. Suppose we want to extract the relevant occupancy for a bishop on B2.
Take the original masked occupancy bitboard, .
8 | . . . . . . . .
7 | . . . . . . . .
6 | . . . . . . b5 .
5 | . . . . . b4 . .
4 | . . . . b3 . . .
3 | . . . b2 . . . .
2 | . . b1 . . . . .
1 | . . . . . . . .
--+------------------------
| A B C D E F G H
Observe that
.
Now, multiply by the magic number
.
Using simple bitwise masking, we can then manually retrieve those packed bits near the MSB to use as our index. We can generate a set of 128 magic numbers (64 for bishops, 64 for rooks) which individually can be used to map each square and occupancy to extract its bits, and then use that operation to retrieve a pre-computed set of moves in constant time.
Side note: Most of the time, there are actually collisions when using magic numbers. Collisions occur when our magic multiply doesn’t perfectly extract the relevant bits, but instead returns some other, bizarre combination of our relevant bits as the index. However, so long as the collisions map the same final moveset, that’s OK, so finding magics is actually pretty easy - brute force search works quite well.
Getting down to business
Most engines both compute all their magic numbers as part of their
startup routines. Before this week, I included the magic numbers as
static constants and computed the entire table of movesets at startup.
However, due to the limitations of Rust, I ended up using
once_cell
’s Lazy
for
initializing the move lookup table. This had the benefit of being really
easy to implement, but it also meant that every time move generation was
required (i.e. millions of times per second) the engine had to check
whether the moves lookup table had already been loaded.
Now, I have to rewrite it from scratch, but in “hard” mode - to make all
the constants known at compile time, everything has to be written in a
const
function. This means:
- No allocations
- No printing
- No panicking
- No for loops
- No trait methods
Let’s get right to it by describing our data layout. We begin by
implementing a Bitboard
by wrapping u64
:
;
For every square and sliding piece type, we’ll create a structure called
an AttacksLookup
:
/// A lookup table for generating attacks via magic bitboard for one piece type and square.
In order for us to have a “ragged” moves-lookup table, we don’t have
each AttacksLookup
own its attack set, since that would result in
extremely inefficient space usage and an enormous increase in what will
already be a very large binary. Instead, we store all of the
moves-lookup tables in one giant array, and have each AttacksLookup
have a reference to its section of that array.
Let’s try to populate that array, staying mindful of our lack of for-loops. First, we’ll store our magic numbers:
/// A saved list of magic multiply numbers for rooks, indexed by the square they're used for.
const SAVED_ROOK_MAGICS: = ;
These magic numbers can be found offline via trial and error, but there are also public records of magic numbers. For example, the Chess Programming wiki keeps a record of magic numbers.
Next, we need to know how much storage to allocate for all the rook moves. We do this by first storing a table of how many entries are needed for each square to calculate our moves.
/// Log-base-2 of the number of entries required in the moves-lookup table for each square.
const ROOK_BITS: = ;
You may notice something odd happening on ranks 7 and 8: the number of bits required for some of the squares is 1 lower than their mask. This is intentional! As it turns out, certain magic numbers will yield just the right set of hash collisions so that you can use half as many entries are there are occupancies. This saves about 18 kB of data.
The size of our table is just the sum of 2 to the power of the number of bits required for each square:
/// Compute the number of entries in a magic-movegen table required to store every element, given
/// the number of bits required for each square.
const
We also need to construct the masks for each square. get_rook_masks
takes in a Square
(hence the call to transmute
) and builds the
relevant occupancy mask for the square.
/// The bitwise masks for extracting the relevant pieces for a rook's attacks in a board, indexed
/// by the square occupied by the rook.
const ROOK_MASKS: = ;
/// Create the mask for the relevant bits in magic of a rook.
/// `sq` is the identifying the square that we want to generate the mask for.
const
We now have enough to actually build the full set of attacks. We do this via brute force: for every possible occupancy, we calculate the set of legal moves, and then save the legal moves at the correct index.
/// The master table containing every attack that the rook can perform from every square under
/// every occupancy.
/// Borrowed by the individual [`AttacksLookup`]s in [`ROOK_LOOKUPS`].
const ROOK_ATTACKS_TABLE: = construct_magic_table;
/// Construct the master magic table for a rook or bishop based on all the requisite information.
///
/// # Inputs
///
/// - `bits`: For each square, the number of other squares which are involved in the calculation of
/// attacks from that square.
/// - `magics`: The magic numbers for each square.
/// - `masks`: The masks used for extracting the relevant squares for an attack on each starting
/// square.
/// - `dirs`: The directions in which the piece can move
const
compute_magic_key
and the other helper functions called above can be
implemented with some basic drudgery.
/// Given some mask, create the occupancy [`Bitboard`] according to this index.
///
/// `index` must be less than or equal to 2 ^ (number of ones in `mask`).
/// This is equivalent to the parallel-bits-deposit (PDEP) instruction on x86 architectures.
const
/// Construct the squares attacked by the pieces at `sq` if it could move along the directions in
/// `dirs` when the board is occupied by the pieces in `occupancy`.
///
/// This is slow and should only be used for generatic magic bitboards (instead of for move
/// generation.
const
/// Use magic hashing to get the index to look up attacks in a bitboard.
const
Lastly, we build each individual AttacksLookup
with its references to
ROOK_ATTACKS_TABLE
.
/// The necessary information for generatng attacks for rook, indexed b the square occupied by
/// said rook.
const ROOK_LOOKUPS: = construct_lookups;
/// Construct the lookup tables for magic move generation by referencing an already-generated
/// attacks table.
const
In order to compute the moves for the rook, we work backwards, going
from the AttacksLookup
into the table. For performance, we perform
unchecked array accesses. We can prove they’re always safe because
Square
s can only have values from 0 through 63, and we know our key
generation code is producing a low enough index to not go out of bounds.
While you weren’t looking, I wrote some very similar code for writing the bishop moves-lookup table. In all, it’s about 300 lines of code.
<insert Rust compiler speed joke here>
It’s now time to compile our code. Running cargo build
, we get to wait
about 30 seconds until we encounter this beauty of an output:
~/C/fiddler (static_magic|✚2) $ cargo build
Compiling fiddler v0.1.0 (/home/clayton/Chess/fiddler)
note: erroneous constant used
--> src/base/movegen/magic.rs:328:6
|
328 | &ROOK_ATTACKS_TABLE,
| ^^^^^^^^^^^^^^^^^^
note: erroneous constant used
--> src/base/movegen/magic.rs:87:33
|
87 | get_attacks(occupancy, sq, &ROOK_LOOKUPS)
| ^^^^^^^^^^^^
note: erroneous constant used
--> src/base/movegen/magic.rs:87:32
|
87 | get_attacks(occupancy, sq, &ROOK_LOOKUPS)
| ^^^^^^^^^^^^^
error: internal compiler error: no errors encountered even though `delay_span_bug` issued
error: internal compiler error: The deny lint should have already errored
--> /home/clayton/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:297:5
|
297 | / int_impl! {
298 | | Self = i8,
299 | | ActualT = i8,
300 | | UnsignedT = u8,
... |
315 | | bound_condition = "",
316 | | }
| |_____^
|
= note: delayed at compiler/rustc_const_eval/src/const_eval/machine.rs:634:26
0: <rustc_errors::HandlerInner>::emit_diagnostic
1: <rustc_errors::Handler>::delay_span_bug::<rustc_span::span_encoding::Span, &str>
2: <rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::statement
3: rustc_const_eval::const_eval::eval_queries::eval_to_allocation_raw_provider
4: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 16]>>
5: <rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
6: <rustc_query_system::query::plumbing::execute_job_incr<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt>::{closure#2}::{closure#2} as core::ops::function::FnOnce<((rustc_query_impl::plumbing::QueryCtxt, rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>), rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
7: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 16]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
8: rustc_query_impl::query_impl::eval_to_allocation_raw::get_query_incr::__rust_end_short_backtrace
9: <rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::eval_mir_constant
10: <rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::push_stack_frame
11: rustc_const_eval::const_eval::eval_queries::eval_to_allocation_raw_provider
12: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 16]>>
13: <rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
14: <rustc_query_system::query::plumbing::execute_job_incr<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt>::{closure#2}::{closure#2} as core::ops::function::FnOnce<((rustc_query_impl::plumbing::QueryCtxt, rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>), rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
15: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 16]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
16: rustc_query_impl::query_impl::eval_to_allocation_raw::get_query_incr::__rust_end_short_backtrace
17: <rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::statement
18: rustc_const_eval::const_eval::eval_queries::eval_to_allocation_raw_provider
19: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 16]>>
20: <rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
21: <rustc_query_system::query::plumbing::execute_job_incr<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt>::{closure#2}::{closure#2} as core::ops::function::FnOnce<((rustc_query_impl::plumbing::QueryCtxt, rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>), rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
22: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 16]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
23: rustc_query_impl::query_impl::eval_to_allocation_raw::get_query_incr::__rust_end_short_backtrace
24: rustc_const_eval::const_eval::eval_queries::eval_to_allocation_raw_provider
25: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 16]>>
26: <rustc_query_impl::query_impl::eval_to_allocation_raw::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
27: <rustc_query_system::query::plumbing::execute_job_incr<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt>::{closure#2}::{closure#2} as core::ops::function::FnOnce<((rustc_query_impl::plumbing::QueryCtxt, rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 32]>>, false, false, false>), rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>)>>::call_once
28: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::DefaultCache<rustc_middle::ty::ParamEnvAnd<rustc_middle::mir::interpret::GlobalId>, rustc_middle::query::erase::Erased<[u8; 16]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
29: rustc_query_impl::query_impl::eval_to_allocation_raw::get_query_incr::__rust_end_short_backtrace
30: <rustc_const_eval::interpret::eval_context::InterpCx<rustc_mir_transform::const_prop::ConstPropMachine>>::eval_mir_constant
31: <rustc_mir_transform::const_prop_lint::ConstPropagator as rustc_middle::mir::visit::Visitor>::visit_basic_block_data
32: <rustc_mir_transform::const_prop_lint::ConstProp as rustc_mir_transform::pass_manager::MirLint>::run_lint
33: rustc_mir_transform::mir_drops_elaborated_and_const_checked
34: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::mir_drops_elaborated_and_const_checked::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 8]>>
35: <rustc_query_impl::query_impl::mir_drops_elaborated_and_const_checked::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, rustc_span::def_id::LocalDefId)>>::call_once
36: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::VecCache<rustc_span::def_id::LocalDefId, rustc_middle::query::erase::Erased<[u8; 8]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
37: rustc_query_impl::query_impl::mir_drops_elaborated_and_const_checked::get_query_incr::__rust_end_short_backtrace
38: <rustc_session::session::Session>::time::<(), rustc_interface::passes::analysis::{closure#2}>
39: rustc_interface::passes::analysis
40: rustc_query_impl::plumbing::__rust_begin_short_backtrace::<rustc_query_impl::query_impl::analysis::dynamic_query::{closure#2}::{closure#0}, rustc_middle::query::erase::Erased<[u8; 1]>>
41: <rustc_query_impl::query_impl::analysis::dynamic_query::{closure#2} as core::ops::function::FnOnce<(rustc_middle::ty::context::TyCtxt, ())>>::call_once
42: rustc_query_system::query::plumbing::try_execute_query::<rustc_query_impl::DynamicConfig<rustc_query_system::query::caches::SingleCache<rustc_middle::query::erase::Erased<[u8; 1]>>, false, false, false>, rustc_query_impl::plumbing::QueryCtxt, true>
43: rustc_query_impl::query_impl::analysis::get_query_incr::__rust_end_short_backtrace
44: <rustc_middle::ty::context::GlobalCtxt>::enter::<rustc_driver_impl::run_compiler::{closure#1}::{closure#2}::{closure#4}, core::result::Result<(), rustc_span::ErrorGuaranteed>>
45: <rustc_interface::interface::Compiler>::enter::<rustc_driver_impl::run_compiler::{closure#1}::{closure#2}, core::result::Result<core::option::Option<rustc_interface::queries::Linker>, rustc_span::ErrorGuaranteed>>
46: std::sys_common::backtrace::__rust_begin_short_backtrace::<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_span::ErrorGuaranteed>, rustc_driver_impl::run_compiler::{closure#1}>::{closure#0}, core::result::Result<(), rustc_span::ErrorGuaranteed>>::{closure#0}::{closure#0}, core::result::Result<(), rustc_span::ErrorGuaranteed>>
47: <<std::thread::Builder>::spawn_unchecked_<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_span::ErrorGuaranteed>, rustc_driver_impl::run_compiler::{closure#1}>::{closure#0}, core::result::Result<(), rustc_span::ErrorGuaranteed>>::{closure#0}::{closure#0}, core::result::Result<(), rustc_span::ErrorGuaranteed>>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0}
48: call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>
at /rustc/b2b34bd83192c3d16c88655158f7d8d612513e88/library/alloc/src/boxed.rs:1985:9
49: call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global>
at /rustc/b2b34bd83192c3d16c88655158f7d8d612513e88/library/alloc/src/boxed.rs:1985:9
50: thread_start
at /rustc/b2b34bd83192c3d16c88655158f7d8d612513e88/library/std/src/sys/unix/thread.rs:108:17
51: start_thread
at ./nptl/pthread_create.c:444:8
52: __GI___clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
= note: this error: internal compiler error originates in the macro `int_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md
note: rustc 1.72.0-nightly (b2b34bd83 2023-06-06) running on x86_64-unknown-linux-gnu
note: compiler flags: --crate-type lib -C embed-bitcode=no -C debuginfo=2 -C incremental=[REDACTED] -C target_cpu=native
note: some of the compiler flags provided by cargo are hidden
query stack during panic:
end of query stack
error: could not compile `fiddler` (lib)
I accidentally found a compiler bug! I opened the issue for it
here. I don’t know
the exact cause for it yet, but I think it’s somehow related to constant
evaluation. In any event, annotating ROOK_ATTACKS_TABLE
with
#[allow(long_running_const_eval)]
seems to fix it.
Running the compiler again, we find that our compile time has ballooned
from about 3 seconds to 48 seconds. Almost all of that time is spent on
generating ROOK_ATTACKS_TABLE
.
Interestingly enough, I wrote an exhaustive test of the magic table generation code, which does essentially the same thing as generating the table, except at runtime, and I found that it ran in 0.00 seconds. I suspect that the constant evaluator in Rust is just plain slow.
Denouement
Now that we’ve gotten the whole thing to compile, we can actually run a benchmark. In chess engines, the main measure of engine quality is Elo, a relative measure of engine performance.
I threw the new version of Fiddler, with its compile-time generated lookup tables, against the older dynamically-generated ones, and here are my results:
Score of fiddler_const_magic vs fiddler_dynamic_magic: 5278 - 5003 - 4271 [0.509]
... fiddler_const_magic playing White: 2766 - 2374 - 2136 [0.527] 7276
... fiddler_const_magic playing Black: 2512 - 2629 - 2135 [0.492] 7276
... White vs Black: 5395 - 4886 - 4271 [0.517] 14552
Elo difference: 6.6 +/- 4.7, LOS: 99.7 %, DrawRatio: 29.3 %
14566 of 30000 games finished.
We get a whole 6 Elo points - equivalent to a 1% improvement in the engine. At least it’s not a regression!