So the iPhone would need to be always listening and processing what it hears for the word "Siri". Your battery would last 11 minutes
It's not just a matter of battery consumption.
Having "open microphone" (where the microphone is always listening) is very difficult when you're doing voice recognition.
Probably the best example of an "open microphone" system in a consumer product is the voice recognition in Microsoft's Kinect for Xbox 360.
It uses a few tricks to maximise how often it is correct:
Users are strongly encouraged to calibrate the system. The console collects a lot of data about the room (by "echoing" sound off the walls, furniture etc.) as well as by listening to the user talk. Sometimes the user might be talking to the console when the room is silent, but they might also be talking to the console during a loud movie or game. The user could be located in multiple parts of the room (different chairs, standing up, moving around etc.) All of these complex scenarios have to be considered.
This is very different to the way voice recognition on a mobile phone works. You can't do the same calibration for every environment, but it avoids some of the above questions by simplifying things:
-Relatively fixed position (you hold the phone and either talk into it using a headset or you speak at no more than arms length)
-Relatively quiet environment (you wouldn't really make a phone call in a loud environment, so you probably wouldn't try and use voice recognition either)
Limited Command vocabulary:
At any one time, Kinect will only be listening for the user to say a few specific words. Anything else and it will simply ignore the user. The more things you want the system to do, the more likely it becomes that it will "overhear" something and perform an action when it shouldn't have done.
Phones get around this by using a button, so that it's only listening when you want it to. Siri supports a lot of different commands and the syntax and is very powerful. The amount of things that it can do mean that it would be far more likely to "overhear" something if it was listening all of the time.