Research

Latest

Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2025). FindTheFlaws: Annotated errors for use in scalable oversight research. arxiv:2503.22989. Link

Recchia, G., Mangat, C., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., and Abdulbari, M. (2025) Confirmation bias: A challenge for scalable oversight. Presents results of two sandwiching-like experiments intended to establish baselines for simple approaches to scalable oversight. Link

Contributed to

Schoenegger, P., Salvi, F., Liu, J., Nan, X., Debnath, R., Fasolo, B., Leivada, E., Recchia, G., Günther, F., et al. (2025). Large language models are more persuasive than incentivized human persuaders. Analysis team lead. arXiv:2505.09662. Link

Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., … & Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2835-8856. Link

Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., … & Verbeken, B. (2025). Humanity’s Last Exam. Link. Co-author on account of contributing question(s) that were selected for the dataset.

McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., … & Perez, E. (2023). Inverse scaling: When bigger isn’t better. Transactions on Machine Learning ResearchLink. Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).