Yifan Peng
PhD Candidate, Carnegie Mellon University
⭐ Now seeking full-time positions in speech and language processing (expected to start in Summer 2025) ⭐
I am a final-year Ph.D. student in the Department of Electrical and Computer Engineering at Carnegie Mellon University. I am fortunate to be supervised by Prof. Shinji Watanabe (Sep 2021 - now) and Prof. Ian Lane (Aug 2020 - Aug 2021; now at UC, Santa Cruz). I received my bachelor’s degree from the Department of Electronic Engineering at Tsinghua University in 2020.
In Summer 2024, I was an AI Research Intern at NVIDIA NeMo, where I worked on joint speech-text language models. In Summer 2023, I was a research scientist intern at Meta AI FAIR and worked on speech language models for voice-preserved textless speech-to-speech translation. In Summer 2022, I worked as a speech recognition intern at ASAPP about speech model compression.
My research area is speech and language processing. My Ph.D. thesis is to develop effective and efficient open speech foundation models. I have led the project of Open Whisper-style Speech Models (OWSM) at CMU WAVLab, developing the first large-scale, fully open speech foundation model from academia. Recently, I am also interested in integrating speech capabilities into large language models.
Throughout my Ph.D. program, I have published first-authored papers in top-tier ML/NLP/Speech conferences, including ICML, ACL, ICASSP, INTERSPEECH, ASRU, and SLT. I am also a contributor to a widely used speech processing toolkit, ESPnet. Specifically, I have been the primary contributor to several major projects:
- Novel speech encoder architecture: Branchformer (ICML’22), E-Branchformer vs Conformer (INTERSPEECH’23)
- Speech model compression: I3D (ICASSP’23 Top 3%), HJ-Pruning (ICASSP’23 Top 3%), DPHuBERT (INTERSPEECH’23)
- Open speech foundation models: OWSM (ASRU’23), OWSM v3.1 (INTERSPEECH’24), OWSM-CTC (ACL’24)
- Speech language models: SpeechLM analysis, MSLM-S2ST, VoiceTextBlender, and more to follow
News
Nov 14, 2024 | A co-authored paper received Best Paper Award at EMNLP 2024 |
---|---|
Aug 30, 2024 | 3 papers are accepted at IEEE SLT 2024 |
Jun 04, 2024 | 4 papers (1 first-authored) are accepted at INTERSPEECH 2024 |
May 16, 2024 | 1 first-authored paper, OWSM-CTC, is accpeted at ACL 2024 (main) |
May 13, 2024 | Joining NVIDIA NeMo Speech in Santa Clara as AI Research Intern |
Jan 01, 2024 | We are hosting a special session at INTERSPEECH 2024 - Spoken Language Models for Universal Speech Processing (Official Site) |
Dec 13, 2023 | Check out the webpage for our Open Whisper-style Speech Models (OWSM) |
Dec 13, 2023 | 3 papers are accepted at ICASSP 2024 |
Sep 22, 2023 | 2 papers (1 first-authored) are accepted at IEEE ASRU 2023 |
Jun 04, 2023 | 3 papers (2 first-authored) are recognized among the top 3% of all papers accepted at ICASSP 2023 |
May 22, 2023 | Joining Meta AI (FAIR) in Seattle as Research Scientist Intern |
May 17, 2023 | 5 papers (2 first-authored) are accepted at INTERSPEECH 2023 |
Feb 17, 2023 | 4 research papers (2 first-authored) and 3 co-authored challenge papers are accepted at ICASSP 2023 |
Sep 30, 2022 | 2 papers (1 first-authored) are accepted at IEEE SLT 2022 |
Jul 17, 2022 | Attending ICML 2022 in Baltimore, Maryland, USA |
May 31, 2022 | Joining ASAPP remotely as Speech Recognition Intern |
May 15, 2022 | 1 first-authored paper is accepted at ICML 2022 |
Select Publications
- EMNLPFoundation ModelTowards Robust Speech Representation Learning for Thousands of LanguagesIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Best Paper Award) , Nov 2024