My Eval for Reasoning LLMs
(last updated: 11th Aug 2025) There are a ton of benchmarks already present for reasoning llms, but I have a very simple one (just a single prompt for now) that I use, every time a new reasoning model is released. It’s a logical pattern discovering question based on Pokemon and various generations’ starters. The prompt goes as follows: This is a puzzle for you to solve the missing Pokemon based on a pattern. Figure out the pattern in the examples I provide by thinking carefully and hard. Here are the examples to begin with: ...