References

[AMSB19]

Forest Agostinelli, Stephen McAleer, Alexander Shmakov, and Pierre Baldi. Solving the Rubik’s cube with deep reinforcement learning and search. Nature Machine Intelligence, 1(8):356–363, 2019.

[Bel57]

Richard Bellman. Dynamic Programming. Princeton University Press, 1957.

[BT96]

Dimitri P Bertsekas and John N Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996. ISBN 1-886529-10-8.

[HAS26]

Gal Hadar, Forest Agostinelli, and Shahaf S Shperberg. Beyond single-step updates: reinforcement learning of heuristics with limited-horizon search. In AAAI. 2026.

[HZRS16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. 2016.

[IS15]

Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.

[LCM+22]

Tianhua Li, Ruimin Chen, Borislav Mavrin, Nathan R Sturtevant, Doron Nadav, and Ariel Felner. Optimal search with neural networks: challenges and approaches. In Proceedings of the International Symposium on Combinatorial Search, volume 15, 109–117. 2022.

[Poh70]

Ira Pohl. Heuristic search viewed as path finding in a graph. Artificial intelligence, 1(3-4):193–204, 1970.

[Rok14]

Tomas Rokicki. God's number is 26 in the quarter-turn metric. http://www.cube20.org/qtm/, Aug 2014.