Publications
Links
Under Review
- Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, and Yanmin Qian. Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection. Submitted to TASLP
- Haiyang Sun, Bing Han, Zheng Lian, leying zhang, Chenda Li, Chenyang Le, Ye Bai, Yi Zhao, Yanmin Qian. ContextSpeech: A Large-Scale Real-Human Speech Corpus with Context-Aware Descriptions. Submitted to NeurIPS.
- Chenyang Le, Bing Han, Jinshun Li, Chen Songyong, Yanmin Qian. SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation. Submitted to NeurIPS.
- Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, etc.. FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation. Paper
- Haiyang Sun, Shujie Hu, Shujie Liu, Lingwei Meng, Hui Wang, Bing Han, etc.. Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling. Paper
Journals
- Bing Han, Zhengyang Chen, Yanmin Qian. Self-supervised learning with cluster-aware-dino for high-performance robust speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 529-541. Paper
- Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian. Attention-based encoder-decoder end-to-end neural diarization with embedding enhancer. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 1636-1649. Paper
- Shuai Wang, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li. Advancing speaker embedding learning: Wespeaker toolkit for research and production. Speech Communication 162, 103104. Paper
Conferences
- Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang. Improving anomalous sound detection via low-rank adaptation fine-tuning of pre-trained audio models. 2024 IEEE Spoken Language Technology Workshop (SLT), 969-974. Paper
- Wen Huang*, Bing Han*, Zhengyang Chen, Shuai Wang, Yanmin Qian. Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification. 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), 383-387. Paper
- Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian. Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1-5. Paper
- Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei. Autoregressive speech synthesis without vector quantization. ACL 2025 main. Paper
- Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan. Anopatch: Towards better consistency in machine anomalous sound detection. INTERSPEECH 2024, 107-111. Paper
- Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei. VALL-E R: Robust and efficient zero-shot text-to-speech synthesis via monotonic alignment. NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation. Paper
- Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu, Wei-Qiang Zhang, Pingyi Fan, Jia Liu, Yanmin Qian. Exploring large scale pre-trained models for robust machine anomalous sound detection. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1326-1330. Paper
- Wen Huang, Bing Han, Shuai Wang, Zhengyang Chen, Yanmin Qian. Robust cross-domain speaker verification with multi-level domain adapters. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11781-11785. Paper
- Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li. Leveraging in-the-wild data for effective self-supervised pretraining in speaker recognition. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 10901-10905. Paper
- Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song. Instructme: An instruction guided music edit and remix framework with latent diffusion models. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 2024. Paper
- Bing Han, Wen Huang, Zhengyang Chen, Yanmin Qian. Improving dino-based self-supervised speaker verification with progressive cluster-aware training. 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). Paper
- Bing Han, Zhengyang Chen, Yanmin Qian. Exploring binary classification loss for speaker verification. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1-5. Paper
- Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian. Attention-based encoder-decoder network for end-to-end neural speaker diarization with target speaker attractor. INTERSPEECH, 3552-3556. Paper
- Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng. A comprehensive study on self-supervised distillation for speaker representation learning. 2022 IEEE Spoken Language Technology Workshop (SLT), 599-604. Paper
- Tao Liu, Xu Xiang, Zhengyang Chen, Bing Han, Kai Yu, Yanmin Qian. The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022. 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP), 498-501. Paper
- Zhengyang Chen*, Bing Han*, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian. Build a sre challenge system: Lessons from voxsrc 2022 and cnsrc 2022. INTERSPEECH 2023, 3202-3206. Paper
- Bei Liu, Zhengyang Chen, Shuai Wang, Haoyu Wang, Bing Han, Yanmin Qian. DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design. INTERSPEECH 2022, 296-300. Paper
- Bing Han, Zhengyang Chen, Yanmin Qian. Self-supervised speaker verification using dynamic loss-gate and label correction. INTERSPEECH 2022, 4780-4784. Paper
- Wei Wang, Xun Gong, Yifei Wu, Zhikai Zhou, Chenda Li, Wangyou Zhang, Bing Han, Yanmin Qian. The sjtu system for multimodal information based speech processing challenge 2021. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 9261-9265. Paper
- Bing Han, Zhengyang Chen, Yanmin Qian. Local information modeling with self-attention for speaker verification. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6727-6731. Paper
- Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian. Mlp-svnet: A multi-layer perceptrons based network for speaker verification. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7522-7526. Paper
- Chenpeng Du, Bing Han, Shuai Wang, Yanmin Qian, Kai Yu. Synaug: Synthesis-based data augmentation for text-dependent speaker verification. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5844-5848. Paper
- Bing Han, Zhengyang Chen, Zhikai Zhou, Yanmin Qian. The SJTU System for Short-Duration Speaker Verification Challenge 2021. INTERSPEECH 2021, 2332-2336. Paper