What exactly is the reason for limits anyway? I'm genuinely curious if someone could explain. In my mind, it doesn't seem like "data" is a limited commodity like water or gas or something, is it? Always wondered why there's a monthly cap on internet usage (not just mobile data).
Actually, data is limited in much the same way as processor power: by time. In computing, there is a concept of a "time slot". When you are trying to do three things on your processor but you only have two cores, the operating system gives time slots on the processor to the different processes. By default, it tries to balance the time slot allocations so that each process gets time at the same rate. Sometimes, certain processes get classified as a higher or lower priority, which adjusts how often they get time slots. There is an entire branch of mathematics dealing with this called queueing theory.
Modern data radios such as the ones that run cell towers work in much the same way. By default, each station (handset, cell modem, tablet, whatever) associated with a single tower gets the same priority. If one station transfers a lot of data all at once, the tower can usually give almost all of the time slots to that user to finish the transfer quickly. When two stations both want to transfer a lot of data, the tower has to slow each of them down and determine who gets to talk when. When one transfer finishes, all of the time slots can go to the other one.
Certain stations can get higher priority and certain tasks can get higher priority. As a specific example, a call to 911 in the United States is given almost top priority by the tower's time sharing system. If a tower is using all of its capacity and a station connected to it calls 911, it will drop another call or data transfer to make room for the 911 call. Of course, if some things get higher priority, others can also get lower priority. For example, a station that has used a certain amount of data within a month can be classified as lower-priority. It will still get available time slots, but when there are more requests than there are time slots to service them, it gets a lower ratio than other stations. This is demand-based throttling.
You can also set up rules that say that a given station can
never be given more than a certain percentage of time slots on a tower, even if the tower is otherwise idle. This is a different type of throttling and this is the kind people tend to get upset about. From a technological standpoint, there is almost never justification for this type.