Recent approaches to conversational agents focus on training giant models with a huge amount of text-based conversations (Adiwardana, et al., 2020, Roller, et al., 2020, Brown, et al., 2020). Despite substantial progress, these big models suffer from various shortcomings that question their intelligence. I categorize them into three groups.
This group of shortcomings introduces how reliable current dialogue systems are in interactions with human users. This group includes: (1.1) the lack of factual consistency through conversations, (1.2) the lack of in-depth knowledge and inference in responses, (1.3) the lack of specificity and engaging use of knowledge, (1.4) the lacks of controllability and explainability, and (1.5) the lack of persona consistency to acquire users' trust and gain their long-term confidence.
This group of challenges addresses the efficiency of these agents. This group includes: (2.1) the lack of efficiency in terms of the model size as these agents are trained end-to-end on a huge amount of text-to-text data using a large number of GPUs for a long time, (2.2) the lack of the users’ privacy as the size of these models is not suitable for edge-case devices, and (2.3) the lack of fairness as these agents turn out to be biased towards genders and social groups.
The last group is about the generalizability of these agents. This group includes: (3.1) the lack of continual learning to update its knowledge, (3.2) the lack of multilingual understanding and responding as not all users speak in English, and (3.3) the lack of a standard approach to measure success as human-evaluation is expensive.
I think that by overcoming the above shortcomings, we will have more engaging and useful conversations with open-domain conversational agents.