Yes, but why wouldn't you do this this on your computer or phone? When I'm looking for train/bus times it's either that I'm planning the next day or getting ready to leave. In either case I'm already at my computer or my iPhone is I'm my pocket. I don't see where an Echo+screen would be more beneficial than a computer in this case -- either one you have to go to the screen. The computer or phone you already have. The Echo+screen is something else to buy. It's very different than the speaker/mic only Echo that you use to bark commands and questions at.
I think Phil isn't so much arguing for a new Alexa-like device with a screen, as he's pointing out the limitations of voice-only devices. The ecosystem is the thing - the Internet of Things.
Despite all the Siri-bashing, a key point is that Apple has a huge head start in this - most people are rarely far from their phones, tablets, or computers, etc. They have displays
and audio, so they address a wider range of needs/usage cases. That is extended even farther if they use Apple TV, Home, Watch, and/or Car Play. There will likely be more wearables, in other form factors.
When it comes to voice input, the closer the user is to the microphone, the more accurate the response can be. The more background sound there is, the harder it is to distinguish commands, so a "close mic" is far more useful than a distant one. The same is true for gesture-based input - distant camera has to distinguish command gestures from other forms of movement, so the wider the field of view, the more people (and pets) there are within that field of view, the more challenging it becomes. Direct input (touch screens, keyboards, pointing devices, etc.) is simpler to implement and interpret than gesture-based or voice-based input.
So, the more sensors and i/o (input/output) devices there are distributed around a person's environment, the better. Sensors and i/o devices that travel with the user can be more effective than stationery sensors - fewer may be required. All in all, Apple already has far more of these in play, in far more form factors, than anyone else with an ecosystem. That's Phil's real point, IMO.