Crafting Multi-Modal Interactions on Voice Assistants

Ashlesha Vishnu  Kadam

doi:10.47363/JEAST/2022(4)188

Authors

Ashlesha Vishnu Kadam Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA Author

DOI:

https://doi.org/10.47363/JEAST/2022(4)188

Keywords:

AI, ASR, Human Computer Interaction, NLU, Multi Modal, Voice Assistant

Abstract

Voice assistants are finding adoption because of their ease and intuitiveness of use. While voice has been the dominant mode of interaction of humans with voice assistants, some embodiments of voice assistants also provide alternative modalities for interaction, popularly, via visual or touch interface. In this paper, the end-to-end working of a multi-modal voice assistant is provided, followed by a deep dive into the challenges associated specifically with multimodal voice assistants. This is followed by mitigation strategies for the challenges arising out of multimodal interaction scenarios. Next, architectural and design guidelines are provided that can provide a seamless user experience. Finally, future research areas have been identified.

Author Biography

Ashlesha Vishnu Kadam, Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA

Ashlesha Vishnu Kadam, Amazon.com, LLC, Amazon Music, City Seattle, State WA, USA

Journal of Engineering and Applied Sciences Technology

Crafting Multi-Modal Interactions on Voice Assistants

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section