Why 2020 Can be the Year of Multi-modal AI
Consumers apps are seeing increased user interaction with rich media. These interactions can be enhanced by multimodal AI-powered systems. For example, imagine that we have a voice-interface to a photo sharing and storage site, and a user issues a voice query like "show me the video of the magic show from my daughter's birthday party". I will discuss the technology that is required to make such advanced user experiences possible, using a combination of voice recognition (to convert the voice query to text), natural language understanding (to parse the text query), multimodal retrieval (to get the video for the birthday parties), object identification (to recognize the video having the daughter), activity recognition (to understand which part of the video has the magic show in it), etc. Such advanced multimodal AI models can be resource intensive, both during training and serving. I will discuss how engineering advances in specialized AI chips and accelerators are making it feasible to train large multimodal ML models, while consumer devices are getting powerful enough to run inference of multimodal ML models on the device itself. This synergy of consumer need, along with advances in research and engineering, is well-poised to make 2020 the year of multimodal AI.
Dr. Shalini Ghosh is Principal Scientist and Leader of the Machine Learning Research at the Visual Display Intelligence Lab of Samsung Research America, where she leads a group working on Multi-modal AI (i.e., learning from vision, language, and speech). Before this she was the Director of AI Research at Samsung Research America. She has extensive experience and expertise in Machine Learning (ML), especially Deep Learning, and has worked on applications of ML to multiple domains. Before joining Samsung Research, Dr. Ghosh was a Principal Computer Scientist in the Computer Science Laboratory at SRI International, where she has been the Principal Investigator/Tech Lead of several impactful DARPA and NSF projects. She was also a Visiting Scientist at Google Research in 2014-2015, where she worked on applying deep learning (Google Brain) models to dialog systems and natural language applications. Dr. Ghosh has a Ph.D. in Computer Engineering from the University of Texas at Austin. She has won several grants and awards for her research, including a Best Paper award and a Best Student Paper Runner-up award for applications of ML to dependable computing. Dr. Ghosh is also an area chair of ICML and serves on the program committee of multiple impactful conferences and journals in ML and AI (e.g., NIPS, KDD, AAAI, IJCAI). She has served as invited panelist in multiple panels, and was invited to be a guest lecturer at UC Berkeley multiple times. Her work has been covered in an interview by the ReWork Women in AI program.