I haven't found any of the "sources" compelling yet with respect to true AI.
Yes, if you ask simple questions/commands in a structured manner they can provide structured answers, in a very limited manner. None yet are very conversational in a larger retained contextual sense. They're fine for simple commands. We're just scratching the surface. There's a looong way to go, especially with respect to natural language processing, adaptive learning/intelligence, user personalization, history-based understanding and diagnosis of complex requests, etc.
Video augmentation in the future will be taken for granted and will not be viewed as augmented. What's out there now is designed to hit low cost targets and displays make that difficult. Humans, however, have eyes as well as ears. Many types of information, especially those that are complex or where many choices/options are the result, are better served by displays rather than a voice providing many choices over tens of seconds; train schedules, choices, maps (including status, not just geographical), photographs, lists, options, and on and on - results that would be difficult to comprehend aurally due to limits of short-term human memory, or where results are clearly visual in nature.
Yes, a desktop/laptop computer, iPad/iPhone, etc could potentially handle that. But that's far from ideal in a quick access always-on home personal assistant context.