Orch: a Rust framework for LLM orchestration

wokwokwok · on July 22, 2024

Given this is just calling the OpenAI / ollama network APIs using reqwest (as far as I can tell, reading the code), care to comment on why you would use it?

As with most things “langchain” esk, where they generate the prompt for you, it’s hard to see what value is really being offered, and they’re fragile against trivial changes in model and model versions.

There’s some mention of agents but I can’t see any examples.

Broadly speaking, what’s the elevator pitch here?

If I’m interacting with either of those two APIs via the network, client libraries are a dime a dozen… but, probably, you’re better off hand crafting your own prompts.

If you wanted to embed an LLM, you wouldn’t rely on a network endpoint and you could just use a llama.cpp binding.

What’s special here?

_flux · on July 22, 2024

> If you wanted to embed an LLM, you wouldn’t rely on a network endpoint and you could just use a llama.cpp binding.

I do want to rely on a network endpoint, as not all my computers are capable of running LLMs locally in an efficient manner.

wokwokwok · on July 22, 2024

> If you wanted to embed an LLM…

_flux · on July 22, 2024

I read it as wanting to embed LLM functionality, but my reading might have been incorrect—and by the client library reference I understood it was referring to generic HTTP clients, but that reading might have also been incorrect :).

Per the examples this looks like an LLM-over-network library and there doesn't seem to be too many of those available for Rust.

guywald · on July 22, 2024

Reasonable question, BTW agents were part of this but I removed them temporarily. There is somewhat agentic behavior in magic-cli which you can take a look at.

My main motivation was a gap in the Rust ecosystem for this, as well as a desire to have reasonable abstractions for model alignment, agents and structured response generation with error correction.

In addition, Ollama is a first-class citizen so local LLMs are supported (it calls the locally hosted APIs which Ollama exposes).

And as a last point, it’s just a fun project to hack on. If you have suggestions for similar abstractions I missed, please let me know!

wokwokwok · on July 22, 2024

If you want feedback…

I’m not sure it’s a good abstraction if it generates a prompt.

Generating good prompts is a Very Hard Problem, and machine generated ones are almost always worse at it than a hand crafted one.

I think if you’re serious you should look at how you can build these systems so the user can use them with entirely hand crafted prompts.

Look at your library from that perspective; if the “generates prompt” part doesn’t exist, what parts are still left?

For example, imagine an agent sandbox where the agent has a set of “tools” like web, command line, code editor and has to pick between tools and craft structured arguments to invoke the various tools.

Given that a) the prompts have to be handed crafted with tweaks per LLM target, b) the set of tools is entirely configurable by the library user, c) at runtime you can pick the set of available tools and LLM to use… that’s an abstraction worth using.

…but it’s hard.

Some other ideas: Eg; agent back off retry for api outages, agents voting on best solution, agent to check output of another agent library automatically generates a new response if the overseer agent rejects the first response, agent to generate code, library parses, executes code. Agents with different system prompts like “civ5 advisors” that can generate suggestions for solving a problem in different ways, multiple api end points to distribute requests, “high and low” agents where an agent can “ask for help” from a more powerful LLM if it gets stuck (eg. For coding, if the generated code fails too many times).

Not: “literally anything” -> library generates terrible prompt -> returns response from API.

guywald · on July 22, 2024

Thanks. You can take a look at the alignment module (there’s an example but it’s not in the README), it implements the “overseer” concept. And the prompts are mostly customizable, except for some hard-coded ones.

benzguo · on July 22, 2024

Neat! Always love seeing new takes on this, in new languages.

Check out Substrate - it's an orchestration framework that also runs the inference. https://substrate.run

lionkor · on July 22, 2024

[flagged]

depr · on July 22, 2024

Interesting how people have started attributing errors to AI. I've seen LLMs invent non-existing functions but I've not seen them misspell anything.

Zambyte · on July 22, 2024

I have, but it was because there was some weird issue between my inference engine, GPU drivers, and kernel version. Switching from Rocky to Ubuntu and updating Ollama and ROCm (all at once, there were a few weeks between attempts) fixed my issue.

dopidopHN · on July 22, 2024

Do you have some idea of a root cause for that? Sound hellish.

Zambyte · on July 22, 2024

I don't. There were a lot of moving parts that I ended up changing at once to resolve the issue.

For what it's worth though, I was using a very similary configuration on my 6650 XT at the time, and that was working fine as long as I set the environment variable: HSA_OVERRIDE_GFX_VERSION=10.3.0. The issue I was seeing was on a machine that had a 7900 XTX, and IIRC I was not setting that environment variable. I suspect that somewhere in the chain something thought the card was one thing, when it was actually another, similar thing.

The behavior was really weird though. It would have tons of typos, but it was still readable. It would also make the same typo for the same word over and over again, and it would use really weird punctuation and whitespace.

guywald · on July 22, 2024

Nope, just my ADHD probably :-) Thanks, will go over it more carefully.

lionkor · on July 22, 2024

Okay! Cargo is misspelled as Caro and below that there is a misplaced `

I would PR this but I dont believe the overhead would be worth it

guywald · on July 22, 2024

Someone put out a PR to fix it (was it you by any chance?) so it’s resolved