2026-02-27 00:00:00:0李铁林3014251810http://paper.people.com.cn/rmrb/pc/content/202602/27/content_30142518.htmlhttp://paper.people.com.cn/rmrb/pad/content/202602/27/content_30142518.html11921 以“有解思维”激发创新活力(评论员观察)
"Before that, in much of Europe, you could love as many people as you like, and love was fluid, and it was often not about sex."
。一键获取谷歌浏览器下载对此有专业解读
Testing LLM reasoning abilities with SAT is not an original idea; there is a recent research that did a thorough testing with models such as GPT-4o and found that for hard enough problems, every model degrades to random guessing. But I couldn't find any research that used newer models like I used. It would be nice to see a more thorough testing done again with newer models.
"I want to interact with my community, and know that whatever platform they're talking on, they're going to be safe."
Additional reporting by Jack Gray