Given this is just calling the OpenAI / ollama network APIs using reqwest (as far as I can tell,
reading the code), care to comment on why you would use it?
As with most things “langchain” esk, where they generate the prompt for you, it’s hard to see what value is really being offered, and they’re fragile against trivial changes in model and model versions.
There’s some mention of agents but I can’t see any examples.
Broadly speaking, what’s the elevator pitch here?
If I’m interacting with either of those two APIs via the network, client libraries are a dime a dozen… but, probably, you’re better off hand crafting your own prompts.
If you wanted to embed an LLM, you wouldn’t rely on a network endpoint and you could just use a llama.cpp binding.
I read it as wanting to embed LLM functionality, but my reading might have been incorrect—and by the client library reference I understood it was referring to generic HTTP clients, but that reading might have also been incorrect :).
Per the examples this looks like an LLM-over-network library and there doesn't seem to be too many of those available for Rust.
Reasonable question, BTW agents were part of this but I removed them temporarily. There is somewhat agentic behavior in magic-cli which you can take a look at.
My main motivation was a gap in the Rust ecosystem for this, as well as a desire to have reasonable abstractions for model alignment, agents and structured response generation with error correction.
In addition, Ollama is a first-class citizen so local LLMs are supported (it calls the locally hosted APIs which Ollama exposes).
And as a last point, it’s just a fun project to hack on.
If you have suggestions for similar abstractions I missed, please let me know!
I’m not sure it’s a good abstraction if it generates a prompt.
Generating good prompts is a Very Hard Problem, and machine generated ones are almost always worse at it than a hand crafted one.
I think if you’re serious you should look at how you can build these systems so the user can use them with entirely hand crafted prompts.
Look at your library from that perspective; if the “generates prompt” part doesn’t exist, what parts are still left?
For example, imagine an agent sandbox where the agent has a set of “tools” like web, command line, code editor and has to pick between tools and craft structured arguments to invoke the various tools.
Given that a) the prompts have to be handed crafted with tweaks per LLM target, b) the set of tools is entirely configurable by the library user, c) at runtime you can pick the set of available tools and LLM to use… that’s an abstraction worth using.
…but it’s hard.
Some other ideas: Eg; agent back off retry for api outages, agents voting on best solution, agent to check output of another agent library automatically generates a new response if the overseer agent rejects the first response, agent to generate code, library parses, executes code. Agents with different system prompts like “civ5 advisors” that can generate suggestions for solving a problem in different ways, multiple api end points to distribute requests, “high and low” agents where an agent can “ask for help” from a more powerful LLM if it gets stuck (eg. For coding, if the generated code fails too many times).
Thanks. You can take a look at the alignment module (there’s an example but it’s not in the README), it implements the “overseer” concept.
And the prompts are mostly customizable, except for some hard-coded ones.
I have, but it was because there was some weird issue between my inference engine, GPU drivers, and kernel version. Switching from Rocky to Ubuntu and updating Ollama and ROCm (all at once, there were a few weeks between attempts) fixed my issue.
I don't. There were a lot of moving parts that I ended up changing at once to resolve the issue.
For what it's worth though, I was using a very similary configuration on my 6650 XT at the time, and that was working fine as long as I set the environment variable: HSA_OVERRIDE_GFX_VERSION=10.3.0. The issue I was seeing was on a machine that had a 7900 XTX, and IIRC I was not setting that environment variable. I suspect that somewhere in the chain something thought the card was one thing, when it was actually another, similar thing.
The behavior was really weird though. It would have tons of typos, but it was still readable. It would also make the same typo for the same word over and over again, and it would use really weird punctuation and whitespace.
As with most things “langchain” esk, where they generate the prompt for you, it’s hard to see what value is really being offered, and they’re fragile against trivial changes in model and model versions.
There’s some mention of agents but I can’t see any examples.
Broadly speaking, what’s the elevator pitch here?
If I’m interacting with either of those two APIs via the network, client libraries are a dime a dozen… but, probably, you’re better off hand crafting your own prompts.
If you wanted to embed an LLM, you wouldn’t rely on a network endpoint and you could just use a llama.cpp binding.
What’s special here?