Language Information Access Technology Team Seminar (Talk by Dr. Bin Wu, NAIST).

Thu, 18 Aug 2022 14:00 - 15:00 JST
Online Link visible to participants

Registration is closed

Get invited to future events

Free admission


This is an online seminar. Registration is required.
【Language Information Access Technology Team】
【Date】2022/Augusth/18(Thu) 14:00-15:00(JST)

【Speaker】 Bin Wu of NARA Institute of Science and Technology

Title: DPGMM-RNN Hybrid Model: Towards Universal Acoustic Modeling to ASR at Different Supervised Levels

Abstract: The independent development of methods for unsupervised and supervised learning induces the different treatments to the unsupervised phoneme discovery and the supervised speech recognition; the two tasks both need acoustic modeling to find patterns that form the perceptual units such as phonemes and words; the only difference is at different supervised levels. So it is reasonable to regard the unsupervised phoneme discovery as the unsupervised ASR (that finds units from speech without text). We propose to use universal acoustic modeling (instead of separated ones) of supervised and unsupervised ASR for the whole process from acoustic waveform to speech units.
The study aims to construct universal acoustic modeling for speech recognition at different supervised levels. Specifically, the work proposes the hybrid model, which combines the Dirichlet process Gaussian mixture model and recurrent neural network (DPGMM-RNN). Furthermore, the proposed approach is utilized (1) to improve phoneme categorization by relieving the fragmentation problem; (2) to extract perceptual features to improve ASR performance.

About this community



Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community