Scale AI Research

Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.

SciPredict: Can LLMs Predict the Outcomes of Research Experiments in Natural Sciences?

January 15, 2026

Safety, Evaluation and Alignment

Agentic Rubrics as Contextual Verifiers for SWE Agents

January 6, 2026

Agents

Safety, Evaluation and Alignment

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

December 22, 2025

Reasoning

Safety, Evaluation and Alignment

MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers

December 18, 2025

Agents

Reasoning

Safety, Evaluation and Alignment

Audio MultiChallenge

December 17, 2025

Multimodal

Safety, Evaluation and Alignment

PropensityBench

November 25, 2025

Safety, Evaluation and Alignment

Professional Reasoning Benchmark

November 13, 2025

Safety, Evaluation and Alignment

Reasoning

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

November 10, 2025

Reasoning

Safety, Evaluation and Alignment

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

November 5, 2025

Safety, Evaluation and Alignment

Expert-Led Private Evaluations for precise and reliable LLM rankings

SEAL’s mission is to build robust evaluation products that tackle the challenging research problems in LLM evaluation and red-teaming.

Learn More→

Join the Scale AI Research Team

Contribute to frontier AI research at Scale.

View all careers→

Loading positions...

News

Blog

Advancing Safe and Reliable AI: Scale's Research in Post-Training, Reasoning, and Evaluation

Research

Humanity's Last Exam

Blog

The New ChatGPT-4o Update Promises Better Writing, How Does It Compare To The New Claude 3.5 Sonnet?

Blog

First Impressions of OpenAI’s o1

Blog

Submit Your Toughest Questions for Humanity's Last Exam

Blog

Scale's SEAL Research Lab Launches Expert-Evaluated LLM Leaderboards

Scale AI Research

Scale AI’s mission is to accelerate the development of AI applications. By advancing research, we aim to create AI systems capable of solving complex, human-level problems.

SciPredict: Can LLMs Predict the Outcomes of Research Experiments in Natural Sciences?

Agentic Rubrics as Contextual Verifiers for SWE Agents

MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes