Boards

Feature Requests

Bug Reports

Powered by Canny

Feature Requests

Support OpenAI service_tier Parameter

OpenAI o3 and o4-mini support Flex processing: this feature provides significantly lower costs for Chat Completions or Responses requests in exchange for slower response times and occasional resource unavailability. To take advantage of Flex processing, you have to set the "service_tier" parameter to "flex" in your API request (Chat or Responses). Reference: https://platform.openai.com/docs/guides/flex-processing?api-mode=responses

Bigger context window + max_tokens Setting

1 – Please update the system parameters to reflect the extended context capacities of modern LLMs, which now support up to 256k and even 1 million tokens. 1.2 — Context Limit → Add a function to set the context limit via a number of tokens. To ensure broad compatibility with upcoming models, support should ideally range from 256 to 4,000,000 tokens. 2 – Add a user-controllable max_tokens output parameter It would be valuable to introduce a user-configurable setting for defining the max_tokens limit for generated responses, allowing: • concise outputs (e.g., SMS, micro-summaries), • or conversely, extended and exhaustive responses (e.g., technical reports, documentation generation). This parameter should: • be configurable directly by the user through the UI/UX, • be overridable during assistant calls or manual regenerations, • remain optional, with a default fallback value when not explicitly set.

Add parameter to get rid of thinking mode

Add custom parameter. It would be great to disable reasoning response on Qwen3-like models. Whe know that using "/no_think" in the prompt can be used, but it is not deep enough to supress models like Qwen3 spitting thoughts. Qwen3 is very good, but it's stupid to think for each of any respond. Also when I use function call, it's slow to act because it's always think first. I want to disable reasoning mode completely. It is achieveable with additional tag like this one. I hope BoltAI have this custom config like this. With enable_thinking set to false, it is then run into instruct mode. This will gives us mode control.

Multi Agent collaboration via @call

Implement a feature in BoltAI that enables users to communicate with two or more AI agents simultaneously within the same conversation. Users could reference each agent by using an @ call, allowing for collaborative problem-solving and diverse perspectives in a single thread. Key Features: 1. @Call Functionality: Users can tag specific AI agents using @agent_name in their messages. Each AI would respond based on their specialization or assigned task, working together via conversation, you could even have a general adversary agent set up, an ai that writes code and and agent that reviews it, for example you could use Claude to write code and chatGPT to review it 2. Agent Roles and Context Management: Allow each AI to maintain its context while sharing relevant information with others. This enables more coherent and role-based interactions. 3. Sequential and Parallel Processing: Users can choose whether the AIs should respond sequentially (one after the other) or in parallel (both responding at the same time), depending on the complexity of the task. 4. Customizable Commands: Users could create custom commands that involve multiple agents, such as having one agent generate ideas while another refines them. Use Cases: • Research and Development: Collaborate with multiple AI models specializing in different fields to tackle complex problems. • Content Creation: Use one AI for generating content and another for editing or optimizing it. • Decision Support: Have different AI agents provide pros and cons on a decision to help users choose the best option. Technical Considerations: • Context Sharing: Ensure that agents can share and receive relevant context without overwhelming the user with redundant information. This feature would significantly enhance BoltAI’s capabilities, making it a more powerful tool for complex tasks that benefit from diverse AI inputs.

Document better understood by AI

Add support for generating multiple embeddings from a long document, using Ollama-compatible embedding models like granite-embedding, nomic-embed-text, snowflake-arctic-embed2:568m, etc. Expected behavior: • A document (e.g. 65 pages) is automatically split into segments (e.g. per paragraph, page, or fixed-size chunks with overlap) • Each chunk is processed independently to produce a separate embedding vector • Output is a list of vectors, one per chunk • Optionally: • Export to JSON or CSV • Show token count and chunk preview • Use for similarity search or RAG Do not compute one embedding for the whole document — the goal is to allow semantic lookup from fine-grained vectors, not a single blurred one.

Add a reversible global toggle to disable all active MCP servers and tools

When switching between different models during a session — some of which require MCP servers, tools, or plugins to be disabled — manually adjusting each parameter becomes cumbersome. A single control should allow users to instantly disable all active tools and servers, while preserving their previous state so they can be restored exactly as they were with one additional click.

Kimi k2 integration please

Return multiple LLM responses from different models for the same prompt similar to the beam feature in Big-AGI.

Open multiple windows

I'd like to be able to have multiple simultaneous conversations with different chats.

under review

Would be super useful to have access to assistants within the app :)

under review

Load More

→

Powered by Canny