CV

Work

  • 2025.06 - Present

    Santa Clara, CA, USA

    nvidia.jpeg
    Research Scientist
    NVIDIA NeMo
    Project: Multimodal large language models, full-duplex speech-to-speech models
  • 2024.05 - 2024.08

    Santa Clara, CA, USA

    nvidia.jpeg
    AI Research Intern
    NVIDIA NeMo
    Project: Speech-text language models with multi-turn mixed-modal chat capabilities
  • 2023.05 - 2023.08

    Seattle, WA, USA

    meta.jpeg
    Research Scientist Intern
    Meta AI (Fundamental AI Research, FAIR)
    Project: Speech large language models for voice-preserved textless speech-to-speech translation
  • 2022.05 - 2022.08

    Remote, USA

    asapp.jpeg
    Speech Recognition Intern
    ASAPP
    Project: Speech model compression and encoder architecture design

Education

  • 2020.08 - 2025.05

    Pittsburgh, PA, USA

    cmu.png
    Doctor of Philosophy in Electrical and Computer Engineering
    Carnegie Mellon University
    Advisor: Prof. Shinji Watanabe
    Thesis: Towards effective and efficient open speech foundation models
    Research areas: Speech foundation models, speech recognition
    Open source: Contributor and maintainer of ESPnet
  • 2016.08 - 2020.06

    Beijing, China

    thu.jpeg
    Bachelor of Engineering in Electronic Information Science and Technology
    Tsinghua University
    GPA: 3.96/4.00, Ranking: 2/262
    Advisor: Prof. Liangrui Peng
    Thesis: Deep learning-based semi-supervised transfer learning for handwritten text recognition

Awards

  • 2025.8.21
    ISCA Award for Best Student Paper at INTERSPEECH 2025
    International Speech Communication Association (ISCA)
    For the first-authored paper: OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
  • 2024.12.4
    IEEE SLT 2024 Best Paper Award
    2024 IEEE Spoken Language Technology Workshop
    For the paper: Contextualized Automatic Speech Recognition with Dynamic Vocabulary
  • 2024.11.14
    EMNLP 2024 Best Paper Award
    The 2024 Conference on Empirical Methods in Natural Language Processing
    For the paper: Towards Robust Speech Representation Learning for Thousands of Languages
  • 2023.6.10
    ICASSP 2023 Top 3% Paper Recognition
    IEEE International Conference on Acoustics, Speech and Signal Processing
    For two first-authored papers and one co-authored paper
  • 2020.2.20
    SPIE Medical Imaging 2020 Best Student Paper Award Finalist
    The International Society for Optics and Photonics (SPIE)
    For first-authored paper: Microcalcification localization and cluster detection using unsupervised convolutional autoencoders and structural similarity index

Languages

Chinese
Native speaker
English
Professional working proficiency

Services

Organizer
Conference reviewer
  • ACL: 2024 (ARR Feb 2024), 2025 (ARR Feb 2025)
  • EMNLP: 2023, 2024 (ARR Jun 2024)
  • NAACL: 2024 (ARR Dec 2023), 2025 (ARR Oct 2024)
  • AAAI: 2025
  • ICASSP: 2023, 2024, 2025
  • INTERSPEECH: 2024, 2025
  • SLT: 2022, 2024
  • ASRU: 2023, 2025
  • IJCNN: 2025
  • SynData4GenAI: 2024
  • WiNLP: 2024, 2025
  • EUSIPCO: 2025
Journal reviewer
Mentor

Teaching

  • 2024.08 - 2024.12
    Graduate Teaching Assistant
    Carnegie Mellon University
    Course:18-781/11-751 Speech Recognition and Understanding
    Instructor: Prof. Shinji Watanabe
  • 2023.08 - 2023.12
    Graduate Teaching Assistant
    Carnegie Mellon University
    Course:18-781/11-751 Speech Recognition and Understanding
    Instructor: Prof. Shinji Watanabe
  • 2022.08 - 2022.12
    Graduate Teaching Assistant
    Carnegie Mellon University
    Course:18-781/11-751 Speech Recognition and Understanding
    Instructor: Prof. Shinji Watanabe
  • 2021.08 - 2021.12
    Graduate Teaching Assistant
    Carnegie Mellon University
    Course:18-781/11-751 Speech Recognition and Understanding
    Instructor: Prof. Ian Lane and Prof. Shinji Watanabe