The microphone would typically be receiving sounds from up to 8 separate audio channels sent to 8 separate speakers, with various delays due to the distance from each speaker to the microphone, and additional sounds due to echoes inside the room.
It's not that simple, even assuming that the TV itself sees the audio signal which may not be true.
The button on the remote makes more sense, or perhaps Apple should copy Microsoft's Kinect and put a camera on the TV so that specific hand gestures could get SiriTV's attention.