Publications

RLP: Reinforcement as a Pretraining Objective

Ali Hatamizadeh*, Syeda Nahida Akter*, Shrimai Prabhumoye*, Jan Kautz, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi.
Published at International Conference on Learning Representations, 2026.

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

Syeda Nahida Akter, Shrimai Prabhumoye, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Yejin Choi, Bryan Catanzaro.
Published at International Conference on Learning Representations, 2026.

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset

Rabeeh Karimi Mahabadi, Sanjeev Satheesh, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro. Published at International Conference on Learning Representations, 2026.

NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning

Syeda Nahida Akter, Shrimai Prabhumoye, Matvei Novikov, Seungju Han, Ying Lin, Evelina Bakhturina, Eric Nyberg, Yejin Choi, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro.
Published at The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026.

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

NVIDIA
Published on arxiv, 2025.

Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning

Jaehun Jung, Seungju Han, Ximing Lu, Skyler Hallinan, David Acuna, Shrimai Prabhumoye, Mostafa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi.
Published at Neural Information Processing Systems, 2025.

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

NVIDIA
Published on arxiv, 2025.

Llama-Nemotron: Efficient Reasoning Models

NVIDIA
Published on arxiv, 2025.

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning

Ximing Lu, Seungju Han, David Acuna, Hyunwoo Kim, Jaehun Jung, Shrimai Prabhumoye, Niklas Muennighoff, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi.
Published on arxiv, 2025.

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

NVIDIA
Published on arxiv, 2025.

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
In The Thirteenth International Conference on Learning Representations (ICLR), 2025.

Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining

Steven Feng* , Shrimai Prabhumoye* , Kezhi Kong, Dan Su, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro.
Published on arxiv, 2024.

LLM-Evolve: Evaluation for LLM’s Evolving Capability on Benchmarks

Jiaxuan You, Mingjie Liu, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi , Bryan Catanzaro
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Nemotron-4 340B Technical Report

NVIDIA
Published on arxiv, 2024.

AgentKit: Flow Engineering with Graphs, not Coding

Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell.
In First Conference on Language Modeling (CoLM), 2024.

Nemotron-4 15B Technical Report

Jupinder Parmar*, Shrimai Prabhumoye*, Joseph Jennings*, Mostofa Patwary*, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro
Published on arxiv, 2024.

Spring: Studying papers and reasoning to play games

Yue Wu, Shrimai Prabhumoye, So Yeon Min, Yonatan Bisk, Russ R Salakhutdinov, Amos Azaria, Tom M Mitchell, Yuanzhi Li.
In Advances in Neural Information Processing Systems (NeurIPs), 2024.

Self-Refine: Iterative Refinement with Self-Feedback

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark.
In Advances in Neural Information Processing Systems (NeurIPs), 2024.

Plan, Eliminate, and Track--Language Models are Good Teachers for Embodied Agents

Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye.
Published on arxiv, 2023.

AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models

Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, R Michael Alvarez, Anima Anandkumar.
Published on arxiv, 2023.

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro.
In the proceedings of the European Association for Computational Linguitics (EACL) 2023.

Context Generation Improves Open Domain Question Answering

Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro.
In Findings of the European Association for Computational Linguitics (EACL) 2023.

Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, Michael Alvarez, Anima Anandkumar.
In Transfer Learning for Natural Language Processing Workshop at NeurIPS 2022.

Evaluating Parameter Efficient Learning for Generation

Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022.

Multi-Stage Prompting for Knowledgeable Dialogue Generation

Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro.
In Findings of the Association for Computational Linguistics (ACL) 2022.

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith*, Mostofa Patwary*, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro.
Published on arxiv, 2022.

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Shrimai Prabhumoye*, Rafal Kocielnik*, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro.
Published on arxiv, 2023.

Five sources of bias in natural language processing

Dirk Hovy, Shrimai Prabhumoye.
Language and Linguistics Compass, 2021.

Focused Attention Improves Document-Grounded Generation

Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.

Case Study: Deontological Ethics in NLP

Shrimai Prabhumoye*, Brendon Boldt*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.

Exploring Controllable Text Generation Techniques

Shrimai Prabhumoye, Alan W Black, Ruslan Salakhutdinov.
Proceedings of the 28th International Conference on Computational Linguistics (COLING) 2020.
Selected for oral presentation

Topological Sort for Sentence Ordering

Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.

Politeness Transfer: A Tag and Generate Approach

Aman Madaan*, Amrith Setlur*, Tanmay Parekh*, Barnabas Poczos, Graham Neubig,Yiming Yang,
Ruslan Salakhutdinov, Alan W Black, Shrimai Prabhumoye.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.

I love your chain mail! Making knights smile in a fantasy game world:
Open-domain goal-oriented dialogue agents

Shrimai Prabhumoye*, Margaret Li*, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam.
Published on arxiv, 2020

Generating Interactive Worlds with Text

Angela Fan*, Jack Urbanek*, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye,
Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston.
In the Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence.

Principled Frameworks for Evaluating Ethics in NLP Systems

Shrimai Prabhumoye, Elijah Mayfield, Alan W Black.
Widening NLP Workshop at ACL 2019.

"My Way of Telling a Story": Persona based Grounded Story Generation

Shrimai Prabhumoye*, Khyathi Chandu*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Storytelling Workshop at ACL 2019.

Equity Beyond Bias in Language Technologies for Education

Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin,
Ezekiel Dixon-Román, Alan W Black.
In the Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications at ACL 2019.