I get it now!
Those attributes you mentioned (depth, color, dynamic range), are those software or hardware things? If Apple wanted, could they allow you, through software, to turn off all the processing and present a photo of what you're actually seeing?
That makes sense. To paraphrase, it's more of an "I was there!" that's desirable, as opposed to "This is the best looking processing pipeline!"?
It's a combination of both. Smartphone camera sensors (and lenses) were absolutely miniscule compared to most purpose built cameras and so you run into the problem of not being able to capture much information full stop, information that encompasses all the attributes of what make a photograph a photograph: light, color, texture, etc.
For the most part you can't get around the physics of a small sensor and lens, the pace of advancement in those kinds of physical sciences is a lot slower than the pace for software development so naturally the solution was to build massive photo processing pipelines to overcome the inherent weaknesses of smartphone cameras.
The bulk of these processes come from HDR and different sharpening/smoothing algorithms.
HDR was introduced to combat the poor dynamic range of smartphone camera sensors. Dynamic range is the range of light a sensor can pick up on without overexposing (where parts of the image turn pure white) or underexposing (where parts of the image turn pure black). To combat poor dynamic range, the cameras take multiple photos in quick succession wherein each photo adjusts the exposure (the light sensitivity of the camera per se) to capture 'highlights' (for example clouds in a bright sky), mids, and shadow areas (say, the shaded area under a tree) and then combine those photos to virtually extend the dynamic range of the final output. This was introduced on the iPhone 4. Here's an example of HDR on the iPhone 4, you can see parts of the highlights are no longer overexposed, the detail has been brought back:

One potential consequence of HDR is you kill off the subtle qualities of shadow detail that give the image a sense of depth in the first place. The above photo is a good example actually because while HDR successfully recovers highlight detail it comes at the expense of making her face look a little flatter and less '3D.' Smartphones are obviously better at HDR now vs. the iPhone 4 from a decade and a half ago but those subtle shadow qualities are still very hard to retain.
Another major trick is using sharpening, smoothing, and anti-noise functions to compensate for lack of detail from a small sensor and sensor noise from bumping up the ISO (the adjustable light sensitivity of the sensor). One way to adjust exposure is to adjust the 'aperture' of the lens itself which is the physical size of the lens opening (click for image example) but most smartphones do not have an adjustable aperture, unlike the Xiaomi 14 Ultra. If you cannot adjust the aperture, you must adjust the ISO instead. Increasing the ISO of a sensor typically results in more noise from the sensor (click for image), so you can get rid of it with anti noise functions.
The problem with anti noise is you're somewhat forced to remove image detail so to give the illusion of bringing it back you introduce sharpening and smoothing functions. You might sharpen the edges of subjects and smooth/saturate some of the other elements of the subject to exaggerate the colors and shape of a subject. Apple have many names for this process like 'Smart HDR' or 'Photonic Engine' and it's caused a lot of controversy. In the past you could do what you're proposing and turn off Smart HDR to get more realistic images but Apple disabled that feature since the 12 Pro.
Apple engineers decided the overall 'artistic' look of a final uncropped image on a smartphone is more important to their users than realism or preserving cropped detail. In an interview Apple said they model iPhone images after the look of oil paintings, ironic considering many iPhone customers have been complaining about an 'oil painting effect' for many years.
The below photo is a good example of the shift Apple have taken in image processing. One is a crop from the 13 Pro which radically boosted the image processing vs. the 11 Pro which takes a moderate approach to post processing. The 11 Pro has more grain but the image is more realistic, it's closer to what a typical camera might take. To a casual observer viewing the full image the 13 Pro may look more aesthetically pleasing (to a point) and indeed many portrait shots I took with my 13 Pro looked a lot nicer on my phone screen than with my 11 Pro.
One targets an 'artistic' look, the other targets a 'realism' look. A lot of people like the artistic look, they want things to look better than real life.

Ultimately there is a balance that must be struck. The Xiaomi is interesting because the camera hardware is a lot more capable than most competitors, primarily because all the cameras on that phone are "1 inch sensors" (not actually 1 inch lol, that's just the name of that size class), they have adjustable aperture (hardly any phones have this), and naturally the lenses are bigger so the phone's dynamic range straight from the sensor (without HDR) is a lot wider. More capable hardware means you have the freedom to tone down post processing and HDR. Yes the Xiaomi is doing a lot of post processing but it doesn't have to be as dramatic, both because the hardware enables it and because the engineers chose to target a more natural look vs. the highly stylized approach taken by Apple, Google, and Samsung. Leica, Xiaomi's partner, are famous for their '3D look' and realistic, rich colors so they also likely had a big say in Xiaomi's image target.
I prefer what Xiaomi and Leica are doing. I hope that if Apple use larger sensors in the future they can tone down some of the 'oil painting effect' and HDR processing to make the image look a bit more realistic rather than artistic. Again, it's a balance, I want the best of both worlds and I think it's possible to get there. At the very least more control over the process on a shot by shot basis would be nice.