Because it's not ready and up to Apple standards yet? I guess you don't understand the huge difference between implementing some random speech-to-text feature (which Macs have had for literally decades btw!) and reliable transcription and a stable, usable implementation in the context of video i.e. an NLE.
Once Apple introduces it it'll most certainly make that turd Premiere look that much worse (if that's even possible). Never mind that you can already get it for FREE and easy with a few clicks via endless Whisper-based transcription apps. Something that Apple would (fortunately) never consider using. They will ultimately, as always, have the far superior solution. Something I have no issue with waiting for.
With "Apple Intelligence" not even being official yet, what tf are they going to base it on??