多智能体自组织语音识别
作者:
作者单位:

西北工业大学 航海学院,陕西 西安 710072

作者简介:

陈俊淇(1998-),男,在读硕士研究生,主要研究方向为模式识别与智能系统.email:jqchen@mail.nwpu.edu.cn.
张晓雷(1983-),男,博士,教授,博士生导师,主要研究方向为模式识别与智能系统.

通讯作者:

基金项目:

伦理声明:



Multi-agent ad-hoc speech recognition
Author:
Ethical statement:

Affiliation:

School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an Shaanxi 710072,China

Funding:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    语音感知是无人系统的重要组成部分,已有的工作大多集中于单个智能体的语音感知,受噪声、混响等因素的影响,性能存在上限。因此研究多智能体语音感知,通过多智能体自组织、相互协作,提高感知性能非常必要。假设每个智能体输出一个通道的语音流条件下,本文提出一种多智能体自组织语音系统,旨在综合利用所有通道提高感知性能;并进一步以语音识别为例,提出能处理大规模多智能体语音识别的通道选择方法。基于Sparsemax算子的端到端语音识别流注意机制,将带噪通道权重置零,使流注意力具备通道选择能力,但Sparsemax算子会将过多通道权重置零。本文提出Scaling Sparsemax算子,只将带噪较强的通道权重置零;同时提出了多层流注意力结构,有效降低了计算复杂度。在30个智能体的无人系统环境下,基于conformer架构的识别系统实验结果表明,在通道数失配的测试环境下,提出的Scaling Sparsemax在仿真数据集上的文字差错率(WER)相比Softmax降低30%以上,在半真实数据集上降低20%以上。

    Abstract:

    Speech perception is an important part of unmanned systems. Most of the existing work focuses on the speech perception of a single agent, which is affected by factors such as noise and reverberation, and the performance has an upper limit. Therefore, it is necessary to study multi-agent speech perception, and improve perception performance through multi-agent self-organization and mutual cooperation. A multi-agent ad-hoc speech system is proposed under the assumption that each agent outputs a channel of speech stream. The multi-agent ad-hoc speech system aims to comprehensively utilize all channels to improve perception performance. Taking the speech recognition as an example, a channel selection method that can handle large-scale multi-agent speech recognition is proposed. Specifically, an end-to-end speech recognition stream attention mechanism based on Sparsemax operator is proposed to force the channel weights of noisy channels to zero, and make the stream attention bear the function of channel selection. Nevertheless, Sparsemax would punish the weights of many channels to zero harshly. Therefore, Scaling Sparsemax is proposed, which punishes the channels mildly by setting the weights of strong noise channels to zero only. At the same time, a multilayer stream attention structure is proposed to effectively reduce computational complexity. Experimental results in an unmanned system environment with up to 30 agents under the conformer speech recognition architecture show that the Word Error Rate(WER) of the proposed Scaling Sparsemax is lower than that of Softmax by over 30% on simulation data sets, and by over 20% on semi-real data sets, in test scenarios with mismatched channel numbers.

    参考文献
    相似文献
    引证文献
引用本文

陈俊淇,张晓雷.多智能体自组织语音识别[J].太赫兹科学与电子信息学报,2023,21(9):1163~1170

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
历史
  • 收稿日期:2021-06-14
  • 最后修改日期:2021-08-09
  • 录用日期:
  • 在线发布日期: 2023-09-27
  • 出版日期:
关闭