al-folio

a simple whitespace theme for academics

a distill-style blog post

an example of a distill-style blog post and main elements

23 min read · 2021

a post with code

an example of a blog post with some code

4 min read · 2015

[3min-Paper] The_Illusion_of_Thinking

AI真的會推理嗎？還是我們的測試方法有問題？

5 min read · June 22, 2025

2025 · reasoning llm evaluation · paper chinese 3min-paper
The Illusion of the Illusion of Thinking: When AI Evaluation Methods Become Traps for Capability Assessment

This commentary paper reveals a shocking truth: we often mistake the limitations of AI evaluation methods for the limitations of AI systems themselves! Research shows that many cases considered AI reasoning failures are actually misjudgments caused by poorly designed evaluation frameworks.

8 min read · June 16, 2025

2025 · ai-evaluation reasoning model-limitations methodology · paper english
[中文版] The Illusion of the Illusion of Thinking: 當AI評估方法成為能力判斷的陷阱

這篇評論文章揭露了一個驚人真相：我們經常將AI評估方法的限制誤認為是AI系統能力的限制！研究發現，許多被認為是AI推理失敗的案例，實際上是評估框架設計不當造成的誤判。

4 min read · June 16, 2025

2025 · ai-evaluation reasoning model-limitations methodology · paper chinese
Persona Features Control Emergent Misalignment

OpenAI research team explores how language models generalize behaviors from training to broader deployment distributions, focusing on emergent misalignment issues. The study reveals that controlling persona features can effectively manage model misalignment behaviors, providing important insights for AI safety.

7 min read · June 16, 2025

2025 · ai-safety alignment persona-features misalignment · paper english
[中文版] Persona Features Control Emergent Misalignment

OpenAI 研究團隊探討語言模型在從訓練分佈泛化到更廣泛部署分佈時的行為變化，特別關注新興錯位對齊問題。研究發現透過控制人格特徵可以有效管理模型的錯位對齊行為，為AI安全提供重要見解。

3 min read · June 16, 2025

2025 · ai-safety alignment persona-features misalignment · paper chinese

al-folio

a simple whitespace theme for academics

a distill-style blog post

a post with code

[3min-Paper] The_Illusion_of_Thinking

The Illusion of the Illusion of Thinking: When AI Evaluation Methods Become Traps for Capability Assessment

[中文版] The Illusion of the Illusion of Thinking: 當AI評估方法成為能力判斷的陷阱

Persona Features Control Emergent Misalignment

[中文版] Persona Features Control Emergent Misalignment