I’ve worked on developing a policy-only, searchless NN chess engine to simulate how humans play chess, using transformer architecture on 500M positions (for reference, Maia-2 used 9B positions). This is slightly different from Maia, which includes a value head in its model – although it’s not clear to me how much the value head drives human-move predictive ability, so I wanted to build a model without one.
I’ve put full model documentation, validation results, and model weights on GitHub and Hugging Face, linked at the bottom – so you could test for yourself, or build your own fine-tuned variant (using your own games, for example, although it would require a large sample size).
High-level, the model which I call “Nova” clearly beats Maia-2 and basically matches the Maia-3 model in human-move prediction. Note that I did validation with the Maia-3 model available at http://maiachess.com, which may be a compacted version, but it’s the only source I could find for now. I didn’t compare against ALLIE, which is a non-Markovian model (prior game history is required for move prediction, not a standalone position; Maia and Nova are Markovian).
I ran validation on 6 rating cohorts with 100k positions each (out of sample, from Lichess March 2026 database). The key results are:
- Hit-rate (top model move = move played by human): Maia-3: 54.8% / Nova: 54.6% / Maia-2: 50.3%
- Average probability mass placed on move played: Nova: 42.5% / Maia-3: 42.1% / Maia-2: 38.4%
- Maia-3 performs relatively better in late-opening through middlegame; Nova performs better in early opening and late-middlegame through endgame
- Nova performs relatively better for under-1700, Maia-3 for above-1700 ELO
While the differences are small between Maia-3 and Nova - and both significantly outperform Maia-2 - I found it interesting how Maia-3 wins on the hit-rate metric, while Nova wins on the probability mass metric; and also how they had different strengths in the game-phase and rating-cohort breakdowns (maybe someone with a strong ML background could speculate why).
In order to play at higher strengths, neither Maia nor Nova (nor any other searchless chess policy models I’m aware of) can do this without some concept of valuation. I describe the process more in the documentation, but I added a filtering layer, which preserves the organic Nova move policy, but at each target rating selectively (probabilistically) filters out some low-quality moves, unless Nova is highly confident in them (in which case they can’t be filtered). I ran thousands of self matches with Nova models of different strengths in order to determine their relative ELO differences, and calibrated their assigned ratings (for play purposes) to match very closely to Chess.com blitz equivalents. For example, Nova-1500 will make a similar ratio of 1.0 to 2.0-pawn level mistakes in each game phase as a Chess.com 1500-rated blitz player would, on average. It is also largely non-deterministic, meaning it will frequently make different moves in the same position in different games.
Here are the GH/HF links and an article writeup:
- https://github.com/novachessai/novachess-engine
- https://huggingface.co/novachess/novachess-engine
- https://novachess.ai/articles/nova_engine_release.html
If you’re interested in playing against Nova, the policy-only bots are on Lichess (Nova_800, Nova_1100, Nova1400, Nova_1700, Nova_2000, Nova_2300).
The rating-calibrated versions are available to play, completely free and unlimited, at http://novachess.ai. The platform also lets you play Nova from custom positions, selected openings lines, and has a conditioned “aggression” level that can be chosen. There's an optional eval bar and option to see threats or get a hint in the position. There is also a Training mode where you can play out common theoretical endgames, curated Master games from all 28 of Rios’ defined pawn structures, and selected positions from your own games where you could have played a better move (auto-generated from your Lichess/Chess.com games).
[link] [comments]