Do you have more than two ears? If you are like most humans, you localize sound based on differences between how the audio is received in each ear. A pair of speakers with a good processing model can generate spatial audio (see the Dolby Atmos headphones as an example). It is certainly easier to do with more speakers in more locations (as one needs to do less work), but one can generate pretty amazing results with just two speaker positions.
Before you get out over your skis here, realize that discerning the location of where a sound is coming from is also effected by the geometry of your head and ears. Time between arrival at one ear versus the other is only part of your brain's equation when processing sounds. How do you think you can tell the difference between a sound coming from 5 feet directly in front of you, 5 feet directly behind you, and 5 feet over your head? In all three cases that sound reaches both ears in about 0.004 seconds, but somehow you're still able to locate the origin. Since the HomePod cannot direct audio along the vertical axis, it's incapable of emulating height channels.
Look up info on binaural recordings for more on how humans process sound.