CV | Yifan Peng

Work

2026.03 - Present

Santa Clara, CA, USA
Senior Research Scientist

NVIDIA

Project: Multimodal Nemotron LLMs, Nemotron VoiceChat (full-duplex speech-to-speech LLMs)
2025.06 - 2026.02

Santa Clara, CA, USA
Research Scientist

NVIDIA

Project: Multimodal Nemotron LLMs, Nemotron VoiceChat (full-duplex speech-to-speech LLMs)
2024.05 - 2024.08

Santa Clara, CA, USA
AI Research Intern

NVIDIA

Project: Speech-text language models with multi-turn mixed-modal chat capabilities
2023.05 - 2023.08

Seattle, WA, USA
Research Scientist Intern

Meta Fundamental AI Research (FAIR)

Project: Speech LLMs for voice-preserved textless speech-to-speech translation
2022.05 - 2022.08

Remote, USA
Speech Recognition Intern

ASAPP

Project: Speech model compression and encoder architecture design

Education

2020.08 - 2025.05

Pittsburgh, PA, USA
Doctor of Philosophy in Electrical and Computer Engineering

Carnegie Mellon University

Advisor: Prof. Shinji Watanabe

Thesis: Towards effective and efficient open speech foundation models

Research areas: Speech foundation models, speech recognition

Open source: Contributor and maintainer of ESPnet
2016.08 - 2020.06

Beijing, China
Bachelor of Engineering in Electronic Information Science and Technology

Tsinghua University

GPA: 3.96/4.00, Ranking: 2/262

Advisor: Prof. Liangrui Peng

Thesis: Deep learning-based semi-supervised transfer learning for handwritten text recognition

Awards

2025.8.21

ISCA Award for Best Student Paper at INTERSPEECH 2025

International Speech Communication Association (ISCA)

For the first-authored paper: OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
2024.12.4

IEEE SLT 2024 Best Paper Award

2024 IEEE Spoken Language Technology Workshop

For the paper: Contextualized Automatic Speech Recognition with Dynamic Vocabulary
2024.11.14

EMNLP 2024 Best Paper Award

The 2024 Conference on Empirical Methods in Natural Language Processing

For the paper: Towards Robust Speech Representation Learning for Thousands of Languages
2023.6.10

ICASSP 2023 Top 3% Paper Recognition

IEEE International Conference on Acoustics, Speech and Signal Processing

For two first-authored papers and one co-authored paper
2020.2.20

SPIE Medical Imaging 2020 Best Student Paper Award Finalist

The International Society for Optics and Photonics (SPIE)

For first-authored paper: Microcalcification localization and cluster detection using unsupervised convolutional autoencoders and structural similarity index

Languages

	Chinese
	Native speaker

	English
	Professional working proficiency

Services

Organizer	INTERSPEECH 2024 Special Session: Spoken Language Models for Universal Speech Processing
Conference reviewer	ACL: 2024 (ARR Feb 2024), 2025 (ARR Feb 2025) EMNLP: 2023, 2024 (ARR Jun 2024) NAACL: 2024 (ARR Dec 2023), 2025 (ARR Oct 2024) AAAI: 2025 ICASSP: 2023, 2024, 2025, 2026 INTERSPEECH: 2024, 2025 SLT: 2022, 2024 ASRU: 2023, 2025 IJCNN: 2025 SynData4GenAI: 2024 WiNLP: 2024, 2025 EUSIPCO: 2025
Journal reviewer	IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) Computer Speech & Language (CSL) Speech Communication (SPECOM) ACM Transactions on Embedded Computing Systems (TECS)
Mentor	SLT-CODE Hackathon at SLT 2022

Teaching

2024.08 - 2024.12
Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe
2023.08 - 2023.12
Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe
2022.08 - 2022.12
Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe
2021.08 - 2021.12
Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Ian Lane and Prof. Shinji Watanabe

Work

Senior Research Scientist

Project: Multimodal Nemotron LLMs, Nemotron VoiceChat (full-duplex speech-to-speech LLMs)

Research Scientist

Project: Multimodal Nemotron LLMs, Nemotron VoiceChat (full-duplex speech-to-speech LLMs)

AI Research Intern

Project: Speech-text language models with multi-turn mixed-modal chat capabilities

Research Scientist Intern

Project: Speech LLMs for voice-preserved textless speech-to-speech translation

Speech Recognition Intern

Project: Speech model compression and encoder architecture design

Education

Doctor of Philosophy in Electrical and Computer Engineering

Advisor: Prof. Shinji Watanabe

Thesis: Towards effective and efficient open speech foundation models

Research areas: Speech foundation models, speech recognition

Open source: Contributor and maintainer of ESPnet

Bachelor of Engineering in Electronic Information Science and Technology

GPA: 3.96/4.00, Ranking: 2/262

Advisor: Prof. Liangrui Peng

Thesis: Deep learning-based semi-supervised transfer learning for handwritten text recognition

Awards

International Speech Communication Association (ISCA)

For the first-authored paper: OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

2024 IEEE Spoken Language Technology Workshop

For the paper: Contextualized Automatic Speech Recognition with Dynamic Vocabulary

The 2024 Conference on Empirical Methods in Natural Language Processing

For the paper: Towards Robust Speech Representation Learning for Thousands of Languages

IEEE International Conference on Acoustics, Speech and Signal Processing

For two first-authored papers and one co-authored paper

The International Society for Optics and Photonics (SPIE)

For first-authored paper: Microcalcification localization and cluster detection using unsupervised convolutional autoencoders and structural similarity index

Languages

Services

Teaching

Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe

Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe

Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Shinji Watanabe

Graduate Teaching Assistant

Carnegie Mellon University

Course:18-781/11-751 Speech Recognition and Understanding

Instructor: Prof. Ian Lane and Prof. Shinji Watanabe