Quick Take: Are open-weight AI models really getting a fair shake in capabilities evals?

Thoughts on Anthropic's postmortem.
READ THE LATEST

Trials & Errors