Publications

Publications in reversed chronological order.

I have published papers in several research areas:

  • Speech Foundation Model (SFM)
  • Speech Model Architecture
  • Efficient Speech Models
  • Speech Applications
    • Automatic Speech Recognition (ASR)
    • Speech Translation (ST)
    • Spoken Language Understanding (SLU)

Please check my Google Scholar or Semantic Scholar page for more information.

2024

  1. SLT
    ASR
    Contextualized Automatic Speech Recognition with Dynamic Vocabulary
    Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
  2. SLT
    Others
    ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
    Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, and Shinji Watanabe
    In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
  3. SLT
    ASR
    Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
    Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, and Shinji Watanabe
    In Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
  4. INTERSPEECH
    Foundation Model
    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
    Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
  5. INTERSPEECH
    ASR
    Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
  6. INTERSPEECH
    Foundation Model
    On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
    Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
  7. INTERSPEECH
    Architecture
    Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
    Darshan Prabhu, Yifan Peng, Preethi Jyothi, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
  8. ACL
    Foundation Model
    OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Aug 2024
  9. arXiv ASR
    4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
    Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, and Shinji Watanabe
    In ArXiv, Jun 2024
  10. arXiv Foundation Model
    Towards Robust Speech Representation Learning for Thousands of Languages
    William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, and Shinji Watanabe
    In ArXiv, Jun 2024
  11. NAACL
    SLU
    UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
    Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, and Shinji Watanabe
    In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun 2024
  12. ICASSPW ASR
    Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024
  13. ICASSP
    Foundation Model
    VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks
    Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
  14. ICASSP
    Foundation Model
    Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech
    Chien-yu Huang, Ke-Han Lu, Shi Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, and Hung-yi Lee
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
  15. ICASSP
    ASR
    Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search
    Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
  16. arXiv Foundation Model
    MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
    Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, and Hongyu Gong
    ArXiv, Mar 2024
  17. arXiv Foundation Model
    An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
    Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, and Hongyu Gong
    ArXiv, Mar 2024
  18. arXiv Foundation Model
    SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
    Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, and Ruihua Song
    ArXiv, Jan 2024

2023

  1. ASRU
    Foundation Model
    Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data
    Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, and Shinji Watanabe
    In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2023
  2. ASRU
    Foundation Model
    Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning
    William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, and Shinji Watanabe
    In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2023
  3. ASR
    End-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognition
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In 人工知能学会第二種研究会資料, Nov 2023
  4. INTERSPEECH
    Efficient Model
    DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
  5. INTERSPEECH
    Architecture
    A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
    Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
  6. INTERSPEECH
    Foundation Model
    Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
    William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
  7. INTERSPEECH
    SLU
    Tensor decomposition for minimization of E2E SLU model toward on-device processing
    Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
  8. INTERSPEECH
    ASR
    Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training
    Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
  9. ACL Demo
    ST
    ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
    Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), System Demonstrations, Jul 2023
  10. IWSLT ST
    CMU’s IWSLT 2023 Simultaneous Speech Translation System
    Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, and Shinji Watanabe
    In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Jul 2023
  11. ICASSP
    Efficient Model
    I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition
    Yifan Peng, Jaesong Lee, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Top 3% of all papers accepted) , Jun 2023
  12. ICASSP
    Efficient Model
    Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding
    Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Top 3% of all papers accepted) , Jun 2023
  13. ICASSP
    SLU
    A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge
    Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
  14. ICASSP
    ASR
    Improving Massively Multilingual ASR with Auxiliary CTC Objectives
    William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Top 3% of all papers accepted) , Jun 2023
  15. ICASSP
    SLU
    The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge
    Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
  16. ICASSP
    Foundation Model
    SpeechLMScore: Evaluating Speech Generation Using Speech Language Model
    Soumi Maiti, Yifan Peng, Takaaki Saeki, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
  17. ICASSP
    SLU
    E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge
    Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
  18. SLT
    SLU
    A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
    Yifan Peng*, Siddhant Arora*, Yosuke Higuchi, Yushi Ueda, Sujay S. Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, and Shinji Watanabe
    In Proceedings of the 2022 IEEE Spoken Language Technology Workshop (SLT), Jan 2023
  19. SLT
    Architecture
    E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition
    Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, and Shinji Watanabe
    In Proceedings of the 2022 IEEE Spoken Language Technology Workshop (SLT), Jan 2023

2022

  1. INTERSPEECH
    Architecture
    Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Takashi Maekaku, Yuya Fujita, Yifan Peng, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2022
  2. ICML
    Architecture
    Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
    Yifan Peng, Siddharth Dalmia, Ian Lane, and Shinji Watanabe
    In Proceedings of the International Conference on Machine Learning (ICML), Jul 2022
  3. IWSLT ST
    CMU’s IWSLT 2022 Dialect Speech Translation System
    Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, and Shinji Watanabe
    In International Workshop on Spoken Language Translation (IWSLT), May 2022
  4. ICASSP
    SLU
    ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet
    Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay S. Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, and Shinji Watanabe
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022

2021

  1. TBME Others
    Anomaly Detection of Calcifications in Mammography Based on 11,000 Negative Cases
    Rui Hou, Yifan Peng, Lars J. Grimm, Yinhao Ren, Maciej A. Mazurowski, Jeffrey R. Marks, Lorraine M. King, Carlo C. Maley, Eun-Sil Shelley Hwang, and Joseph Y. Lo
    IEEE Transactions on Biomedical Engineering, Nov 2021

2020

  1. SPIE Others
    Microcalcification localization and cluster detection using unsupervised convolutional autoencoders and structural similarity index
    Yifan Peng, Rui Hou, Yinhao Ren, Lars J. Grimm, Jeffrey R. Marks, E. Shelley Hwang, and Joseph Y. Lo
    In Proceedings of the SPIE Medical Imaging 2020: Computer-Aided Diagnosis (Robert F. Wagner Best Student Paper Award Finalist) , May 2020