This is not true.
When two microphones are used, each channel is a single, complete monophonic signal documenting every bit of the particular sound event, but each from a slightly different position.
Stereo speakers can better transmit this then using a single speaker.
Separation of the speakers is of no different then standing outside the sweet spot of any stereo system. So try this. Using your sound sound or a stereo stand off to the side where the sound isn't as full and giving the feeling of depth. Then switch between mono and stereo/surround. Even without being in the sweet spot (little no speaker separation) it sounds better on stereo.
Again why the HTC one has such good sound compared to the iPhone.
There are two kinds of loss that can occur when summing a stereo signal to mono; phase cancellation, which occurs when the the same sound arrives at different microphones at different times, and the loss of spatial positioning information that comes when losing the stereo sound field.
An incredibly small percentage of stereo recordings are made with two spaced microphones (which is the technique most likely to produce phase cancellation). Most are either made with far more mics, or a combination of mics and electronic sources. In almost all cases, artificial panning is used to provide positioning in the sound field. Directional mics, close-micing, overdubbing, gobo boards, isolation booths - all are techniques that minimize phase cancellation. Those essentially isolated, independent sources, no matter how they've been panned in the stereo field, can be summed to mono with impunity and still maintain musical integrity - every note and overtone from every instrument will be there. What's lost is spatial positioning.
But what about those "true stereo" recordings? There are two principal approaches to two-mic stereo recording - "spaced pair" and "coincident pair."
Spaced pair (say, two omnidirectional mics ten feet apart) distorts reality in various odd ways - are your ears ten feet apart? Your speakers may be 10 feet apart (in which case, there's not much harm at all), but that won't be the case if you're wearing headphones. But more important, because the mics are separated, the sound from an instrument on the far left arrives at the right-hand mic later than it arrives at the left-hand mic. While the time delay does add a certain spaciousness (similar to reverberation), it also means that, if combined to mono, some frequencies will be out-of-phase and therefore cancel, generally muddying the sound. In that case, indeed, something is lost by summing to mono.
Coincident micing uses two directional mics in the same location (often in a common enclosure), to minimize the time arrival differences that produce phase cancellation, and to provide a more natural-sounding result (at least, more natural-sounding in my opinion). To be perfectly positioned to mimic your ears, they should be several inches apart, but even that results in some cancellation if summed. Dummy head technique, which uses that last approach, avoids cancellation by placing a sound-absorbant baffle between the spaced mics, but is less favored as theory hasn't quite measured up to practice.
Listening outside of the sweet spot in a speaker system simply means you will be receiving an out-of-balance effect. If you sit in front of the left speaker, you'll hear very little from the right speaker. If, say, the clarinets are full right (not that they normally are), you'll hear very little of the clarinets from your position on the left. In fact, you'd hear a lot more of the clarinets if they'd been summed to mono.
The perception of stereo as being "better" in those circumstances comes from the complex sound field in the listening room - reflections off the walls and ceiling, etc. Pipe a mono signal through a pair of spaced speakers in a moderately reflective listening space, and you'll regain the impression of stereo - the same sound will be arriving at your ears at different times, and your brain localizes those differing sources. This is an old, old recording technique - the famous EMT reverb plate had a mono input and two spaced pickups to produce a stereo output. And when digital reverb arrived on the scene? Mono in still produces stereo out.
You can add a bit of spaciousness to the mono output of an iPhone simply by placing it on a hard, reflective surface (you'll also gain a boost in overall volume, as more sound energy is directed your way). If the iPhone had a pair of speakers? You'd be hard-pressed to localize sound sources in the regular stereo field, but you would gain some "air" due to the slight differences in arrival time. A bit of DSP could be applied to broaden the sound field, though.
I'm with most everyone else - adding stereo speakers to an iPhone is not worth the trade-offs (such as a smaller battery, smaller speakers, etc.). On a larger device, like an iPad? That's a horse of a different color.
(And, as to the superior sound of a particular Walkman? Not hard to achieve when there are no other functions in need of space or power, and truly, superior sound is essential to achieve when, in Alton Brown's words, it's a "mono-tasker.")