I don't remember the name of the TA I discussed this with this morning, but I finally managed to make RAG work with ollama https://github.com/LGabAnnell/inlp-tp/blob/master/main.py
The difference in models is quite funny
- mistral:latest doesn't call the tool unless told so multiple times over, and instead invents document source ids (and as such answer as well) to fulfill system prompt
- qwen3:latest does OK but ignores system prompt formatting (gives me markdown and doesn't always cite sources, but the tool is called and the info is mostly pulled correctly)
- gpt-oss:latest sometimes works, but often just calls the tool multiple times until context window is filled and after that it gives up and says "Hello, how may I help you today"...
