But this is the standard approach. It's called "intention analysis". You essentially turn a user sentence into a variable that dictates the intention.
For example:
- Turn on alarm
- Siri, turn on alarm
- I want to turn on the alarm
- I want the alarm to be turned on
All these sentences can be converted into the variable TURN_ON_ALARM, for example, and then Siri will proceed to the next step, prompting the user what time it wants to set the alarm to. Usually, you will train a model with thosands of variations, and eventually Siri will be able to recognize sentences even if they are not directly listed in the model.
For example, suppose the user instead says:
Siri, I want you to turn on the alarm
Notice none of the sentences exactly correspond to the user's new sentence. But because the new sentence is similar to the sentences the model already knows, there is a high chance it will correctly infer that "Siri, I want you to turn on the alarm" is TURN_ON_ALARM.
This is also how our head works, by the way. We compare what someone is saying to the sentences we already have in our head to infer someone's intention (but in a much more complex, nuanced way).