Hacker Newsnew | past | comments | ask | show | jobs | submit | ATechGuy's commentslogin

+1. It is confusing.


And how are you going to define what ocaps/flows are needed when agent behavior is not defined?

This is a really good question because it hits on the fundamental issue: LLMs are useful because they can't be statically modeled.

The answer is to constrain effects, not intent. You can define capabilities where agent behavior is constrained within reasonable limits (e.g., can't post private email to #general on Slack without consent).

The next layer is UX/feedback: can compile additional policy based as user requests it (e.g., only this specific sender's emails can be sent to #general)


but how do you check that an email is being sent to #general, agents are very creative at escaping/encoding, they could even paraphrase the email in words

decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful


> decades ago securesm OSes tracked the provenience of every byte (clean/dirty), to detect leaks, but it's hard if you want your agent to be useful

Yeah, you're hitting on the core tradeoff between correctness and usefulness.

The key differences here: 1. We're not tracking at byte-level but at the tool-call/capability level (e.g., read emails) and enforcing at egress (e.g., send emails) 2. Agent can slowly learn approved patterns from user behavior/common exceptions to strict policy. You can be strict at the start and give more autonomy for known-safe flows over time.


what about the interaction between these 2 flows:

- summarize email to text file

- send report to email

the issue is tracking that the first step didnt contaminate the second step, i dont see how you can solve this in a non-probabilistic works 99% of the time way


you can restrict the email send tool to have to/cc/bcc emails hardcoded in a list and an agent independent channel should be the one to add items to it. basically the same for other tools. You cannot rewire the llm, but you can enumerate and restrict the boundaries it works through.

exfiltrating info through get requests won't be 100% stopped, but will be hampered.


parent was talking about a different problem. to use your framing, how you ensure that in the email sent to the proper to/cc/bcc as you said there is no confidential information from another email that shouldnt be sent/forwarded to these to/cc/bcc

The restricted list means that it is much harder for someone to social engineer their way in on the receiving end of an exfiltration attack. I'm still rather skeptical of agents, but a pattern where the agent is allowed mostly readonly access, its output is mainly user directed, and the rest of the output is user approved, you cut down the possible approaches for an attack to work.

If you want more technical solutions, put a dumber clasifier on the output channel, freeze the operation if it looks suspicious instead of failing it and provoking the agent to try something new.

None of this is a silver bullet for a generic solution and that's why I don't have such an agent, but if one is ready to take on the tradeoffs, it is a viable solution.


TBH, this looks like an LLM-assisted response.

and then the next:

> you're hitting on the core tradeoff between correctness and usefulness

The question is, is it a completely unsupervised bot or is a human in the loop. I kind of hope a human is not in the loop with it being such a caricature of LLM writing.


Interesting. I'd appreciate an example. Thanks!


I really like how the shell and regular API calls has basically wholesale replaced tools. Real life example of worse-is-better working in the real world.

Just give your AI agent a little linux VM to play around that it already knows how to use rather than some specialized protocol that has to predict everything an agent might want to do.


no workie

The link is still working for me.

Would love to see performance numbers with nested virtualization, particularly that of IO-bound workloads.

> a website can expose functions like searchProducts(query, filters) or orderPrints(copies, page_size) with full parameter schemas

How would this not create backend load and abuse?


Why do you believe it to automatically do so?

Thanks for your insights!

QQ: does uxwizz also show AI agents visiting websites? have you seen traffic from AI agents?


I currently do not automatically detect the agents themselves, and most of the bot traffic is ignored, but for the traffic that is not ignored it usually shows as a 0s session (so you can filter for all sessions that have a 0s replay, which are usually bots).

I am still torn whether I should actually track bots/AI traffic or simply drop it, maybe I will add a toggle in the setting, maybe it's at least interesting to see how much spam the website gets and coming from where.


Oh, and as for the agents themselves browsing the website, if the agent uses an actual browser with JS enabled (like headless puppeteer), then you would actually be able to see how the agent browsed the website.

Yes, but how can you tell if it was an agent with full browser?

If they don't set the proper UserAgent, it's not trivial.

You could add some JS to test it and store it as a tag, maybe using something like: https://stackoverflow.com/a/78629469/407650


That's a good start I'd say, but I agree with you that detection is not trivial. I wonder if there's enough value in distinguishing between AI agents (with full browser) and humans. What use cases would it enable?

For the distinguishing part, it's hard to tell if it can be done anyway, as the agent browsers are still new and constantly changing, and it's up to them if they will correctly identify themselves or not (same as with crawler/bots, the main indication is still the source IP address).

There could be use cases, like tracking if your content was stolen/parsed by an AI, maybe future pay-per-request for LLMs, etc.


I agree. However, how to define these permissions when agent behavior is undefined?

Curious to know what made you DIY this?

Tell me a better alternative that allows me to run, say, 'markdown lint', an npm package, on the current directory without giving access to the full system on Mac OS?

sandbox-exec -f curr_dir_access_profile.sb markdownlint

So you have to install npm package markdownlint on your machine and let it run it's potentially dangerous postinstall step?

You can customize curr_dir_access_profile.sb to block access to network/fs/etc. Why is this not enough?

Some tools do require Internet access.

Further, I don't even want to take the risk of running 'npm install markdownlint' anymore on my machine.


I understand the concern. However, you can customize the profile (e.g., allowlist) to only allow network access to required domains. Also, looks like your sandboxing solution is Docker based, which uses VMs on a Mac machine, but will not use VMs on a Linux machine (weak security).

That's why I wrote my own sandbox. Everyone hand waives these concerns.

Further, I don't know why docker is weak security on Linux. Are you telling me that one can exploit docker?


dockerd is a massive root-privileged daemon just sitting there, waiting for its moment. For local dev it’s often just unnecessary attack surface - one subtle kernel bug or namespace flaw, and it’s "hello, container escape". bwrap is much more honest in that regard: it’s just a syscall with no background processes and zero required privileges. If an agent tries to break out, it has to hit the kernel head-on instead of hunting for holes in a bloated docker API

then use podman instead.

These are all wrappers around VMs. You could DIY these easily by using EC2/serverless/GCP SDKs.

Modal engineer here. This isn’t correct. You can DIY this but certainly not by wrapping EC2 which is using the Nitro hypervisor and is not optimized for startup time.

Nearly all players in this space use Gvisor or Firecracker.


Do you know Eric Zhang by chance? I went to school with him and saw that he was at Modal sometime back. Potentially the smartest person I’ve ever met… and a very impressive technical mind.

Super impressed with what you’ve all done at Modal!


yeh of course I worked with him for a few years! Agree, smartest person I've ever worked with, and there's a smart crowd at Modal.

You can and can’t, at least in AWS. For instance, you can’t launch a EC2 to a point you can ssh in less than 8-10 seconds (and it takes a while to get EBS to sync the entire disk from s3).

Many a time I have tried to figure a self scaling EC2 based CI system but could never get everything scaled and warm in less than 45 seconds, which is sucky when you’re waiting on a job to launch. These microvm as a service thingys do solve a problem.

(You could use lambda, but that’s limited in other ways).


To the commenters here: thanks for correcting me! So AWS is losing AI sandboxing market to GCP due to high cold start times of EC2...very interesting!

I will ask what I've asked before: how to know what resources to make available to agents and what policies to enforce? The agent behavior is not predefined; it may need access to a number of files & web domains.

For example, you said: > I don't expose entire /etc, just the bare minimum How is "bare minimum" defined?

> Inspecting the log you can spot which files are needed and bind them as needed. This requires manual inspection.


Article author here. I used trial and error - manual inspection it is.

This took me a few minutes but I feel more in control of what's being exposed and how. The AI recommended just exposing the entire /etc for example. It's probably okay in my case, but I wanted to go more precise.

On the network access part, I let it fully loose (no restrictions, it can access anything). I might want to tighten that in the future (or at least disallow 192.168/16 and 10/8), for now I'm not very concerned.

So there's levels of how tight you want to set it.


> I feel more in control of what's being exposed and how

Makes complete sense. Thanks for your insights!


Ask the agent to bubblewrap itself

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: