黄浩
|
教授、博士研究生导师
研究领域:语音与声学信号处理、自然语言处理、机器学习
办公室&实验室:新疆大学博达校区信息楼A518
电子邮件:hwanghao@gmail.com,huanghao@xju.edu.cn
联系电话:(+86)
|
教育背景
2004.09-2008.11 上海交通大学 电子工程系 博士
2001.09-2004.07 新疆大学 电气工程半岛体育在线(中国)有限公司官网 硕士
1995.09-1999.07 上海交通大学 信息与控制工程系 学士
工作简历
2017.01-至今 新疆大学计算机科学与技术半岛体育在线(中国)有限公司官网 教授
2009.10-2016.12 半岛体育在线(中国)有限公司官网 副教授
2008.01-2009.09 半岛体育在线(中国)有限公司官网 讲师
1999.09-2008.01 新疆大学电气工程半岛体育在线(中国)有限公司官网 讲师
学术兼职
IEEE会员,国际语音通信协会会员,中国计算机学会会员,中国声学学学会会员
下列期刊或者国际会议的审稿人: IEEE/ACM TASLP, Speech Communication, ICASSP, INTERSPEECH, ICME, ASRU, SLT
研究内容
长期从事音频、语音与语言信息处理、多媒体信息处理、机器学习与人机交互技术等领域的研究工作。当前研究兴趣包括:语音识别、语音合成与转换、语音信号处理、语音关键词检出、对话系统与语音交互、语音内容与音频场景分析等。
学术合作
与新加坡南洋理工大学Tesemak实验室 Eng Siong Chng教授,新加坡字节跳动Haihua Xu博士,日本NICT Sheng Li博士长期保持学术合作与学生交流访问。
主持项目
中国电信北京研究院2021年行业定制AI能力研究技术服务(语音关键词检索与语音质量评估方向),横向课题,2021.10-2022.3,38.69万元,主持人
新疆广电网络语音智能审核平台项目(语音关键词检索方向),横向课题,70万元,2021.11-2022.11, 技术负责人
复杂声学场景下的语音内容审核关键技术研究, 新疆多语种信息技术实验室开放课题,20万元,2021.1-2023.12主持人
国家重点研发项目 民族民间文化资源传承与开发利用技术集成与应用示范(项目编号:2017YFB1402100) 课题1:民族民间文化资源收集和民族语言文化传播研究( 课题编号:2017YFB1402101) 课题负责人 2017.12 -2020.11 100 万元
国家自然科学基金地区基金 “基于无监督学习方法的口语理解与人机对话行为研究”(2017.1-2020.12) 42万元
国家自然科学基金地区基金“口语对话系统技术在自由表述语言学习中的应用研究-以新疆少数民族学生的普通话学习为例”(2014.1-2017.12) 45万元
国家自然科学基金地区基金“面向新疆少数民族汉语语言学习的自动发音错误检测方法的研究” (2010.1-2012.12) 24.0万元
学术成果
代表性论文
在语音、声学、自然语言处理顶级期刊与国际会议IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)、Journal of the Acoustical Society of America (JASA), Speech Communication, EMNLP, ICASSP, ICME, INTERSPEECH等发表论文40多篇:
2024
Minjie Tang, Hao Huang*, Wenbo Zhang, Liang He, Phase continuity-aware self-attentive recurrent network with adaptive feature selection for robust VAD. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024. (Oral Presentation)
Sijie Feng, Haoxiang Su, Hongyan Xie, Di Wu, Hao Huang∗, Wushour Silamu. Fact-aware summarization with contrastive learning for few-shot dialogue state tracking. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.
Di Wu, Liting Jiang, Lili Yin, Kai Wang, Haoxiang Su, Zhe Li, Hao Huang. Dual Level Intent-Slot Interaction For Improved Multi-Intent Spoken Language Understanding. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.
Haoxiang Su, Sijie Feng, Hongyan Xie, Di Wu, Hao Huang*, Zhongjiang He, Shuangyong Song, Ruiyu Fang, Xiaomeng Huang, Wushour Silamu. Domain-Slot Aware Contrastive Learning For Improved Dialogue State Tracking. The 49th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.
2023
Haoxiang Su, Hongyan Xie, Hao Huang*, Shuangyong Song, Ruiyu Fang, Xiaomeng Huang, Sijie Feng. Scalable-DSC: A Structural Template Prompt Approach to Scalable Dialogue State Correction, The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore, December 6–10, 2023. (Long Paper, CCF Rank B)
Hao Huang, Lin Wang, Jichen Yang, Ying Hu, Liang He. W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervision[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1): 45. (SCI)
Kai Wang, Jingjing Liu, Yizhou Peng, Hao Huang*. Neural RAPT: Deep Learning-based Pitch Tracking with Prior Algorithmic Knowledge Instillation. International Journal of Speech Technology, 2023. (EI Compendex)
Rui Li, Zhiwei Xie, Haihua Xu, Yizhou Peng, Hexin Liu, Hao Huang*, Eng Siong Chng. Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory. Interspeech 2023. (CCF Rank C)
Yachad Guo, Zhibin Qiu, Hao Huang*, Chng Eng Siong. Improved Keyword Recognition Based on Aho-Corasick Automaton. 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023: 1-7. (CCF Rank C)
Jichen Yang, Yi Zhou*, Hao Huang*. Mel-S3R: Combining Mel-Spectrogram and Self-Supervised Speech Representation with VQ-VAE for Any-to-Any Voice Conversion. Speech Communication, (151) 52-63, 2023. (SCI, CCF Rank B)
Zhibin Qiu, Yachao Guo, Mengfan Fu, Hao Huang*, Ying Hu, Liang He, Fuchun Sun. CRA-DiffuSE: Improved Cross-Domain Speech Enhancement Based on Diffusion Model with T-F Domain Pre-Denoising. IEEE International Conference on Multimedia & Expo (ICME) July, 10-14, Australia. (CCF Rank B)
Yuhang Yang, Haihua Xu, Hao Huang*, Eng Siong Chng , Sheng Li. Speech-Text based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition. The 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 4-10, Greece, 2023. (CCF Rank B)
Zhibin Qiu, Mengfan Fu, Yinfeng Yu, LiLi Yin, Fuchun Sun, Hao Huang*. SRTNet: Time Domain Speech Enhancement via Stochastic Refinement. The 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 4-10, Greece, 2023. (CCF Rank B, Oral Presentation)
Saierdaer Yusuyin, Hao Huang*, Junhua Liu, Cong Liu. Investigation into Phone-Based Subword Units for Multilingual End-To-End Speech Recognition. The 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 4-10, Greece, 2023. (CCF Rank B, Oral Presentation)
Lili Yin, Di Wu, Zhibin Qiu Hao Huang*. Mitigating Domain Dependency for Improved Speech Enhancement via SNR Loss Boosting. The 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 4-10, Greece, 2023. (CCF Rank B)
Kai Wang, Yuhang Yang, Hao Huang*, Ying Hu, Sheng Li. SpeakerAugment: Data Augmentation for Generalizable Source Separation via Speaker Parameter Manipulation. The 48th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), June 4-10, Greece, 2023. (CCF Rank B)
2022
Hongyan Xie, Haoxiang Su, Shuangyong Song, Hao Huang*, Bo Zou, Kun Deng, Jianghua Lin, Zhihui Zhang and Xiaodong He. Correctable-DST: Mitigating Historical Context Mismatch between Training and Inference for Improved Dialogue State Tracking. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi, December 7–11, 2022. (CCF Rank B, Long Paper, Oral Presentation)
Guodong Ma, Pengfei Hu, Nurmemet Yolwas, Shen Huang and Hao Huang*, Boosted Phone-mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition. Interspeech 2022. (Top Conference, CCF Rank C).
Yizhou Peng, Jicheng Zhang, Haihua Xu, Hao Huang*, Eng Siong Chng, Minimum Word Error Training for Non-Autoregressive Transformer-Based Code-Switching ASR. The 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, Singapore, 2022. (Top Conference, CCF Rank B)
Kai Wang, Yizhou Peng, Hao Huang*, Ying Hu, Sheng Li. Mining Hard Samples Locally And Globally for Improved Speech Separation. The 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, Singapore, 2022. (Top Conference, CCF Rank B)
2021
Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Hao Huang*, Aishan Wumaier, and Eng Siong Chng. Enriching Under-Represented Named-Entities To Improve Speech Recognition Performance. In: Proc. of the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Dec. 14-17, 2021, TOKYO, JAPAN.
Yizhou Peng, Jicheng Zhang, Haobo Zhang, Haihua Xu, Hao Huang*, and Eng Siong Chng. A multilingual approach to joint Speech and Accent Recognition with DNN-HMM framework. 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Dec. 14-17, 2021, TOKYO, JAPAN.
Guodong Ma, Pengfei Hu, Jian Kang, Nurmemet Yolwas, Shen Huang*, Hao Huang*. Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition. Interspeech 2021. (Top Conference, CCF Rank C).
Jicheng Zhang, Yizhou Peng, Pham Van Tung, Haihua Xu, Hao Huang *, Eng Siong Chng. E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition. Interspeech 2021. (Top Conference, CCF Rank C).
Kai Wang, Hao Huang*, Ying Hu, Zhihua Huang, Sheng Li. End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time-Frequency Domain. Interspeech 2021. (Top Conference, CCF Rank C).
Xiao Kang, Hao Huang*, Ying Hu, Zhihua Huang. Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion. Digital Signal Processing (2021): 103110. (SCI)
Hao Huang*, Kai Wang, Ying Hu, Sheng Li. Encoder-Decoder based Pitch Tracking and Joint Model Training for Mandarin Tone Classification. The 46th IEEE International Conference on Acoustics, Speech and Signal Processing, June, 6-11, Toronto, Canada, 6943-6947, 2021. (Top Conference, 2021, CCF Rank B)
Weiqi Gao, Hao Huang*. A gating context-aware text classification model with BERT and graph convolutional Networks. Journal of Intelligent and Fuzzy Systems. vol. 40, no. 3, pp. 4331-4343, 2021. (SCI)
Tingzhi Mao, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Hao Huang*, Eng Siong Chng. Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems. The 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2021.
2020
Haobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang*, Eng Siong Chng. Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition. The 21st Annual Conference of the International Speech Communication Association, (INTERSPEECH), 2392-2396, 2020. (Top conference, CCF Rank C)
Zhong Ying, Ying Hu*, Hao Huang, and Wushour Silamu. A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition. Proc. Interspeech 2020 (2020): 3331-3335.
董兴磊,胡英*,黄浩,吾守尔.斯拉木,基于稀疏卷积非负矩阵部分联合分解的单声道语音分离. 自动化学报,2020.
-2019
Hao Huang*, Haihua Xu, Ying Hu, Gang Zhou, A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection. Journal of the Acoustical Society of America (JASA). 142(5), 2017. (TOP Journal, CCF Rank B)
Haihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng and Haizhou Li. Semi-supervised and Cross-lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models under Low-resource Conditions. INTERSPEECH 2016. (Top conference, CCF Rank C)
Hao Huang*,Haihua Xu,Xianhui Wang,Wushour Silamu,Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection, IEEE/ACM Transactions on Audio, Speech and Language Processing, 23 (4), 787-797, 2015.(SCI, Top journal, CCF Rank B)
黄浩*,徐海华,王羡慧,吾守尔.斯拉木,自动发音错误检测中基于最大化F1值准则的区分性特征补偿训练算法,电子学报,2015,43(7):1294-1299.
Hao Huang, Wang J, Abudureyimu H. Maximum F1-score discriminative training for automatic mispronunciation detection in computer-assisted language learning. Thirteenth Annual Conference of the International Speech Communication Association (INTERSPEECH), 2012. Oral Presentation (Top conference, CCF Rank C).
黄浩*、李兵虎、吾守尔·斯拉木. 区分性模型组合中基于决策树的声学上下文建模方法.自动化学报, 09期, 1449-1458, 2012 ,EI, 期刊论文
Hao Huang*, Binghu Li. Lattice Based Discriminative Model Combination Using Automatically Induced Phonetic Contexts. 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, 2011/8/29, Poster Presentation. (Top conference, CCF Rank C)
黄浩*、李兵虎. Automatic context induction for tone model integration in Mandarin speech recognition,中国邮电高校学报(英文版), 19(1), 94-100, 2012/1/20
Xiong Y, Zhu J, Huang Hao, Haihua Xu. Minimum tag error for discriminative training of conditional random fields. Information Sciences, 2009, 179(1): 169-179. (SCI CCF Rank B)
Huang Hao, Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for Mandarin speech recognition[C]. Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008: 1541-1544.(Top conference, CCF Rank B)
HUANG H, Jie ZHU. Discriminative tonal feature extraction method in mandarin speech recognition. The Journal of China Universities of Posts and Telecommunications, 2007, 14(4): 126-130.EI 期刊论文
Huang H, Zhu J. Minimum phoneme error based filter bank analysis for speech recognition. 2006 IEEE International Conference on Multimedia and Expo (ICME), 2006: 1081-1084 (CCF Rank B).
荣誉奖励
黄浩 1/4 “A Transfer Learning Approach to Goodness of Pronunciation for Automatic Mispronunciation Detection” 第十五届自治区自然科学优秀论文奖二等奖 排名第一 获奖人:黄浩 徐海华 胡英 周刚
黄浩 1/4 “Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection” 第十四届自治区自然科学优秀论文奖一等奖 排名第一 获奖人:黄浩 徐海华 王羡慧 吾守尔.斯拉木
黄浩 1/2 “Automatic context induction for tone model integration in Mandarin speech recognition” 第十二届自治区自然科学优秀论文奖三等奖, 2013.9 排名第一 获奖人:黄浩 李兵虎
系统展示
研究组在多语言的语音识别(汉语、维吾尔语、哈萨克、阿拉伯语、俄语、马来语、泰语、印尼语),语音合成(汉语、维吾尔语),音频内容分析与关键词检测(汉语、维吾尔语、哈萨克语)、以及语音智能对话客服系统进行了系统级的开发并部分进行实际部署,欢迎联系我们。
培养学生信息
目前读硕博士研究生19人,学生在读期间有机会赴海内外合作伙伴(大学与研究机构)进行为期6个月至两年的合作研究、交流实习。欢迎对以上研究方向感兴趣、自我驱动、不怕困难并敢于自我挑战,以及有志成为语音语言智能研究方向的科学家、工业界从业者的有志青年报考本方向的博士硕士研究生。
近年访问研究(实习)的学生与就业情况:
王俊超 (2016级,南洋理工大学、百度实习,语音合成方向,入职百度语音组)
李文杰 (2017级,南洋理工大学、字节跳动实习,语音合成方向,入职百度语音组)
张皓博 (2018级,南洋理工大学实习,语音识别方向,入职出门问问)
茆廷志 (2018级,南洋理工大学、阿里巴巴实习,语音识别方向,入职科大讯飞)
彭亦周 (2019级,南洋理工大学实习、语音识别方向,南洋理工大学攻读博士,ICASSP Student Travel Grant)
张记成 (2019级,南洋理工大学实习,语音识别方向,入职思必驰)
麻国栋 (2019级,腾讯实习,语音识别方向,入职网易)
谢红岩 (2019级,京东实习,对话系统方向,入职京东人机交互技术部)
杨宇航(2020级在读,南洋理工大学实习,语音识别方向)
郭亚超 (2020级在读,南洋理工大学实习,关键词检索方向)
李睿 (2021级在读,南洋理工大学远程实习,多语种语音识别)
谢志伟 (2021级在读,南洋理工大学远程实习,稳健性语音识别)
杨洪利 (2022级在读,香港中文大学(深圳)实习,多语言语音识别)
尹皓文 (2022级在读,香港中文大学(深圳)实习,语音合成与鉴伪)
|