DeepSeek

I often test new LLM models with the following question — Given the following conditions, how many ways can Professor Y assign six different books to four different students? The most expensive book must be assigned to student X. Each student must receive at least one book. The first model that could solve this problem was OpenAI’s o1 model. The second one was Google’s Gemini 2.0 Flash Thinking model. Then more thinking models came out, like DeepSeek’s R1, Anthropic’s Claude 3.7 (with extended thinking mode), all of which can solve this problem without any mistakes. (Update on 2025-07-13: Now the models that can solve the problem include: Grok, Mistral, Qwen — I have lost count of how many models can solve the problem now.) ...