Publications
Publications in reversed chronological order.
I have published papers in several research areas:
- Speech Foundation Model (SFM)
- Speech Model Architecture
- Efficient Speech Models
- Speech Applications
- Automatic Speech Recognition (ASR)
- Speech Translation (ST)
- Spoken Language Understanding (SLU)
Please check my Google Scholar or Semantic Scholar page for more information.
2025
- NAACL DemoFoundation ModelESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue SystemsIn Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Demo Track Poster (NAACL Demo) (accepted), Apr 2025
- NAACL DemoFoundation ModelESPnet-SpeechLM: An Open Speech Language Model ToolkitIn Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Demo Track Poster (NAACL Demo) (accepted), Apr 2025
- NAACLFoundation ModelVoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningIn Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL) (accepted), Apr 2025
- ICLRFoundation ModelContext-aware Dynamic Pruning for Speech Foundation ModelsIn Proceedings of the Thirteenth International Conference on Learning Representations (ICLR) (accepted), Apr 2025
- AAAIASREnhancing Audiovisual Speech Recognition through Bifocal Preference OptimizationIn Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI), Mar 2025
- arXiv Foundation Model
- TASLPASRJoint Beam Search Integrating CTC, Attention, and Transducer DecodersIEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Jan 2025
2024
- SLTOthersESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and IntegrationIn Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
- SLTASRRobust Audiovisual Speech Recognition Models with Mixture-of-ExpertsIn Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
- INTERSPEECHASRContextualized End-to-end Automatic Speech Recognition with Intermediate Biasing LossIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- INTERSPEECHFoundation ModelOn the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation ModelsIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- INTERSPEECHArchitectureMulti-Convformer: Extending Conformer with Multiple Convolution KernelsIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- NAACLSLUUniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language InstructionsIn Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun 2024
- ICASSPW ASRJoint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge DistillationIn IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024
- ICASSPFoundation ModelVoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation TasksIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
- ICASSPFoundation ModelDynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For SpeechIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
- ICASSPASRContextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam SearchIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
- arXiv Foundation Model
2023
- ASRUFoundation ModelJoint Prediction and Denoising for Large-Scale Multilingual Self-Supervised LearningIn Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2023
- ASREnd-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognitionIn 人工知能学会第二種研究会資料, Nov 2023
- INTERSPEECHFoundation ModelReducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic ComputeIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
- INTERSPEECHSLUTensor decomposition for minimization of E2E SLU model toward on-device processingIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
- INTERSPEECHASRTime-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block TrainingIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
- IWSLT STCMU’s IWSLT 2023 Simultaneous Speech Translation SystemIn Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Jul 2023
- ICASSPSLUThe Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource ChallengeIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
- ICASSPFoundation ModelSpeechLMScore: Evaluating Speech Generation Using Speech Language ModelIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
- ICASSPSLUE-Branchformer-Based E2E SLU Toward Stop on-Device ChallengeIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
- SLTArchitectureE-Branchformer: Branchformer with Enhanced Merging for Speech RecognitionIn Proceedings of the 2022 IEEE Spoken Language Technology Workshop (SLT), Jan 2023
2022
- INTERSPEECHArchitectureAttention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASRIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2022
- IWSLT STCMU’s IWSLT 2022 Dialect Speech Translation SystemIn International Workshop on Spoken Language Translation (IWSLT), May 2022
- ICASSPSLUESPnet-SLU: Advancing Spoken Language Understanding Through ESPnetIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022
2021
- TBME OthersAnomaly Detection of Calcifications in Mammography Based on 11,000 Negative CasesIEEE Transactions on Biomedical Engineering, Nov 2021