AUDIO-VISUAL SPEECH RECOGNITION USING AAM‑BASED VISUAL FEATURES
As one of the techniques for robust speech recognition under noisy environments, audio-visual speech recognition (AVSR) using lip dynamic scene information together with audio information is attracting attention, and the research has made strides in recent years. However, in visual speech recognition (VSR), when a face turns sideways, the shape of the lip as viewed by the camera changes and the recognition accuracy degrades significantly. Therefore, many of the conventional VSR methods are limited to situations in which the face is viewed from the front. This paper proposes a VSR method to convert faces viewed from various directions into faces that are viewed from the front using Active Appearance Models (AAM). In the experiment, even when the face direction changes about 30 degrees relative to a frontal view, the recognition accuracy improves significantly.
audio-visual, speech recognition, face direction.