Generally speaking, when the usage environment is a conversation, it is difficult to reach the rate limit.
Scenario 1:
Currently, Google has a daily usage limit for the free plan of the Gemini API. If the limit is reached, the Key of another account needs to be changed in BoltAI, or multiple custom models that call the Gemini API need to be created in advance. When the usage limit is reached, switch to another model. I think this is not very elegant.
Scenario 2:
Some people might share their key with friends or family members for joint use, which could also reach certain rate or quota limits at some brief moments. Adding some simple load balancing mechanisms would prevent this situation.
Scenario 3:
In a company, you might use the key provided by the company, but when you return home, you may not be able to call it because you are not in the company's network environment. At this time, you need to switch to a model that can be used at home. If the models are the same, this also increases the meaningless creation in the model list. If a simple priority arrangement can be added, using simple load balancing to call the same model under the same priority, it can make the display of the model list more elegant and clear.