Siri tries to understand what you want done, not merely convert voice to a command. For example, most commands for save are "save," but what if I said store. Siri would understand my intent not the literal word aligned with a command.
Nuance can with pretty good accuracy convert voice to text, 70-85% accuracy. For editing or creating a document, not so good, but usuable for matching a small set of commands in an OS, probably closer to 99%. This is great for speaking commands and of course knowing the commands.
Siri is so much more and we are just seeing the tip of the iceberg. If I say I want to watch "once upon a time" last episode in season 1, then the intelligence needs to understand what this means as a TV show and if it is stored in iTunes as a download get it, or ask if I have it on a DVD, or do I want to buy or rent it, or know it is being rerun tomorrow on TMC .... this is probably the complexity that Job's broke not merely "voice recognition" as well as the business model that goes with it.
It is not a better remote control or better like an iPod, iPhone, iPad, and MacOS were better than anything before them, MP3 player, cell phone, tablet, and DOS.