# 随笔 ## 2026.01.15 1. RL has limited effectiveness when applied to extremely underfit or overfit initial checkpoints [1]. 2. Despite RL's superior generalization, we show that SFT is still helpful for effective RL training: SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. [1]. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training. ## 2026.01.05 - topdown学习策略:为了解决一个问题,涉及到哪些知识就去学习。