Publications
Publications in reversed chronological order.
I have published papers in several research areas:
- Speech Foundation Model (SFM)
- Speech Model Architecture
- Efficient Speech Models
- Speech Applications
- Automatic Speech Recognition (ASR)
- Speech Translation (ST)
- Spoken Language Understanding (SLU)
Please check my Google Scholar or Semantic Scholar page for more information.
2024
- SLTASRContextualized Automatic Speech Recognition with Dynamic VocabularyIn Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
- SLTOthersESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and IntegrationIn Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
- SLTASRRobust Audiovisual Speech Recognition Models with Mixture-of-ExpertsIn Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Dec 2024
- EMNLPFoundation ModelTowards Robust Speech Representation Learning for Thousands of LanguagesIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Best Paper Award) , Nov 2024
- INTERSPEECHASRContextualized End-to-end Automatic Speech Recognition with Intermediate Biasing LossIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- INTERSPEECHFoundation ModelOn the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation ModelsIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- INTERSPEECHArchitectureMulti-Convformer: Extending Conformer with Multiple Convolution KernelsIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
- arXiv ASR4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict DecodersIn ArXiv, Jun 2024
- NAACLSLUUniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language InstructionsIn Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Jun 2024
- ICASSPW ASRJoint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge DistillationIn IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024
- ICASSPASRContextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam SearchIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024
- arXiv Foundation ModelSpeechComposer: Unifying Multiple Speech Tasks with Prompt CompositionArXiv, Jan 2024
2023
- ASRUFoundation ModelJoint Prediction and Denoising for Large-Scale Multilingual Self-Supervised LearningIn Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2023
- ASREnd-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognitionIn 人工知能学会第二種研究会資料, Nov 2023
- INTERSPEECHSLUTensor decomposition for minimization of E2E SLU model toward on-device processingIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
- INTERSPEECHASRTime-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block TrainingIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023
- IWSLT STCMU’s IWSLT 2023 Simultaneous Speech Translation SystemIn Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Jul 2023
- ICASSPSLUA Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality ChallengeIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
- ICASSPASRImproving Massively Multilingual ASR with Auxiliary CTC ObjectivesIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Top 3% of all papers accepted) , Jun 2023
- ICASSPSLUThe Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource ChallengeIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
- ICASSPSLUE-Branchformer-Based E2E SLU Toward Stop on-Device ChallengeIn Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2023
2022
- INTERSPEECHArchitectureAttention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASRIn Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2022
- IWSLT STCMU’s IWSLT 2022 Dialect Speech Translation SystemIn International Workshop on Spoken Language Translation (IWSLT), May 2022
2021
- TBME OthersAnomaly Detection of Calcifications in Mammography Based on 11,000 Negative CasesIEEE Transactions on Biomedical Engineering, Nov 2021