Also, how will it know to not listen to the dialog on the TV shows/music, etc.?
Yeah that'll be a big problem to overcome. Apple will have to find a way to filter voice created from speakers to real life voice.
It is really quite simple - Since the TV is producing those sounds it just has an internal feedback loop. Kind of the opposite of noise cancellation headphones. The biggest issue is adjusting for the volume of the sound and feeding back an appropriate inverse sound field.
Now if you live in a busy subway station you will have other background noise issues but it should work quite well in a typical living room.