DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian
Shanghai Jiao Tong University · Department of Computer Science & Engineering
Auditory Cognition & Computational Acoustics

The article reviews the remarkable achievements of Shanghai Jiao Tong University's Audio Cognition and Computational Acoustics Lab in 2025, covering research innovation, talent cultivation, and academic exchanges, including paper publications, model releases, team-building activities, and awards.

Professor Qian Yanmin was awarded the Second Ruiyuan Youth Science and Technology Award in Information and Space Technology for his outstanding achievements in the field of auditory artificial intelligence. His innovative research effectively addressed the long-standing 'cocktail party problem' in the field, laying a technical foundation for the large-scale application of auditory processing and speech interaction technologies.

Shanghai Jiao Tong University, in collaboration with several universities and companies, won two championships in the Low-Complexity Acoustic Scene Classification and Industrial Machine Anomalous Sound Detection tasks, and a third place in the Automated Audio Captioning task at the DCASE 2024 International Challenge.

This article explores audio watermarking technology in the era of generative speech, analyzing its potential applications in traceability, anti-cloning, and compliance. It details the components of watermarking systems, deployment methods, challenges, and threat models, along with evaluation criteria and key findings. Finally, it identifies core shortcomings and improvement directions for watermarking technology deployment.
Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian
Bei Liu, Yanmin Qian
Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe
Wangyou Zhang, Zhengyang Chen, Chenda Li, Yanmin Qian
Haiyang Sun, Shujie Hu, Shujie Liu, Lingwei Meng, Hui Wang, Bing Han, Yifan Yang, Yanqing Liu, Sheng Zhao, Yan Lu, Yanmin Qian
Chenda Li, Wei Wang, Samuele Cornell, Bing Han, Leying Zhang, Zhengyang Chen, Shinji Watanabe, Yanmin Qian
Signal processing techniques that enhance and separate speech signals, such as speech enhancement, separation, dereverberation, and robust acoustic feature extraction.
Methods for transcribing and interpreting speech, including automatic speech recognition, speech translation, and contextual spoken language understanding.
Generative models for producing speech and audio from text, semantic representations, or other modalities, including text-to-speech and expressive speech synthesis.
Professor
Dr. Yanmin Qian is a Full Professor in Shanghai Jiao Tong University, China. He received his PhD in the Department of Electronic Engineering from Tsinghua University, China in 2012, and he was also an Associate Research at the Speech Group in Cambridge University Engineering Department, UK, from 2015 to 2016. He is a senior member of IEEE and a member of ISCA, and one of the founding members of Kaldi Speech Recognition Toolkit. He has published more than 300 papers on speech and language processing with 20,000 citations, and also granted more than 120 patents from China and US. He led the team to win the champion of international challenge 6 times. He was the recipient of several awards including IEEE SPS Best Paper Award, Elesiver Speech Communication Best Paper Award and Best Paper Award from IEEE ISCSLP'24, IEEE ASRU’19 and IEEE ISCSLP’16. He was also honored with several high-level talent awards in China, including Chang Jiang Scholars Program of the Ministry of Education, Excellent Youth Scientists of National Natural Science Foundation of China and The First Prize Award of Wu Wenjun Artificial Intelligence Science and Technology Award. He is currently a Member of IEEE Signal Processing Society Speech and Language Technical Committee. His research interests include the speech recognition and translation, speaker and language recognition, speech separation and enhancement, natural language understanding and multi-media signal processing.
Bing Han
PhD
Chenyang Le
PhD
Haibin Yu
PhD
Leying Zhang
PhD
Wei Wang
PhD
Xun Gong
PhD
Haoyu Wang
MS
Siyi Zhao
MS
Tingxiao Zhou
MS
Wen Huang
MS
Xin Zhou
MS
View all
We welcome applications from PhD, Master, and Postdoctoral candidates.