Document better understood by AI
Kinkazma
Add support for generating multiple embeddings from a long document, using Ollama-compatible embedding models like granite-embedding, nomic-embed-text, snowflake-arctic-embed2:568m, etc.
Expected behavior:
• A document (e.g. 65 pages) is automatically split into segments (e.g. per paragraph, page, or fixed-size chunks with overlap)
• Each chunk is processed independently to produce a separate embedding vector
• Output is a list of vectors, one per chunk
• Optionally:
• Export to JSON or CSV
• Show token count and chunk preview
• Use for similarity search or RAG
Do not compute one embedding for the whole document — the goal is to allow semantic lookup from fine-grained vectors, not a single blurred one.
Kinkazma
I don't know if many people will be interested, but I think this application has everything to gain from this implementation, regardless of what people want.