Agreed - this is far more impressive than most people realize. The Pixel phones and the iPhone XR achieve this with a single lens by leveraging focus pixels to generate a depth map from available 3D information. The SE doesn’t have enough focus pixels for that, so it’s relying purely on machine learning, building a depth map from 2D information.
I’m surprised it was more cost effective to do it this way rather than just make the changes necessary to fit the XR rear camera module. That really speaks to the expense of retooling manufacturing.
Software solutions are almost always more cost effective, especially considering apple’s scale.