Text Generation WebUI needs no introduction of course, but for TabbyAPI:
TabbyAPI is a really nice model server for ExllamaV2 that provides an OpenAI compatible API. It's by far the most easy to use and reliable ExllamaV2 server I've found. Exllamav2 offers highly performant and efficient server of exl2 models with a cutting-edge KV caching system which allows you to run models with large contexts (great for coding!) on limited vRAM.
It has an API that allows for loading models that works well, they provide a Gradio based app for listing, selecting and loading models with a given set of parameters
I was wondering how hard it might be to add to add to Bolt the ability to list, select and load models with configured parameters?
I know it's a bit different from the feature set that Bolt currently has, so am aware it's a bit of an ask, but I also think there's genuinely quite an opportunity here.