That’s pretty much
any computing problem, though.

Until someone invented software to take a surround sound signal and, utilizing the positioning information of a source and the targets to “agree” on a “center” and then deliver custom audio streams to each of the two targets in real time such that the surround sound is simulated... the iPhone couldn’t do it! And, Apple’s
already doing computational audio using the iPhone microphones for capturing Stereo and performing Audio Zoom, so it’s within their area of focus.
But, I’d imagine, nearer term, something like the imaginary RØDE i16 would work nicely, LOL