Seeking Deep Thoughts
Papers
LLM
NLP
RL
Thoughts
Thoughts that came to mind as I’m reading the Deepseek Maths paper
I started reading the Deepseek Math paper, after recently finishing the R1 paper. The following thoughts started coming to mind:
- R1 is successful simply because of the training process
- Deepseek Math is worked really well because of data quality
- GRPO is used only to reduce memory usage (still reading the Deepseek Math paper; I could be wrong)
Takeaway is that a lot can be done by simply flipping and rearranging all the existing levers and switches we already have.
Back to top