Oh man, this podcast, I still remember walking down the street and having to tak...

homefree · on Nov 3, 2023

The box experiment was just an example that people could be persuaded to let it out even if the AI was initially constrained. Basically that people are imperfectly secure.

It’s also not that important given it’s unlikely to be put in a box in the first place.

Your latter point about AGI exploring the universe makes a lot of implicit assumptions about its reasoning. The point of the paperclip maximizer example and the general discussion of alignment is about these assumptions being false. The risk is that a very capable AGI can still pursue a very dumb goal very effectively. You don’t get alignment for free.

gooseus · on Nov 4, 2023

> Your latter point about AGI exploring the universe makes a lot of implicit assumptions about its reasoning.

Absolutely, that's actually kind of my point... Anyone who tries to predict how some AGI will behave will be making a lot of implicit assumptions about how that AGI will see the world. This is why I said:

> Personally, I think the way these people talk about what a potential AGI would do reveals a lot about more about how they see the world (and humanity) than how any AGI would see it.

Since AGI could conceivably take any form, these discussions end up being a kind of Rorschach test that allow people to tell stories based strongly on their own personal fears and desires.

People who say that AGI will look at humans like we look at ants or apes, and exterminate us if we get in their way are saying a lot about how they view ants and/or apes. I doubt you'd find myrmecologists (or anyone dedicating their lives to studying less complex life) assuming a highly intelligent AGI would want to exterminate lesser life forms.

With regards to the paper-clip maximizer, I think a run-away dumb AI is more likely, but not as risky since they aren't considered to be so intelligent that humans can't figure out how to stop them. You just need to include some kind of regulating function in with your utility function, it seems equivalent to making sure you don't accidentally turn all of your iron plates into iron sticks in Factorio. Def possible, and certainly sucks, but it's not the end of the world.

I have a hard time conceiving of an AI that is smart enough to manipulate people, break out of every containment system, be unstoppable by humanity... be can't be reasoned with because it's entire goal is to just maximize paperclips. I honestly don't think an entity can possess the ability to defeat the collective intelligence of humanity, yet lack the ability to understand the universe in a similar way; for instance, if it is incapable of altering it's goal from "Maximizing Paperclips", then that fact would be a likely path to a vulnerability we could use to stop it.

homefree · on Nov 4, 2023

It’d be a longer conversation that’s hard to do via HN comments, but I think the main divide is I get the impression you’re giving the AGI implicit human-like reasoning, but the idea behind the orthogonality thesis or alignment generally is that you don’t get these things for free.

It’s not that humans hate ants or apes, it’s that we pursue goals without thinking too hard about them. A house being built may destroy an ant hill but it’s not because we hate ants.

The core argument is it’s not only possible to have an intelligence that’s a lot more capable than us but with dumb goals because of our failure to align it, but that that’s the default outcome. There is no “reasoning with it” because it’s not a human like intelligence, it has a goal it’s focused on (paperclips) and if it’s a lot smarter than us then that’s game over.

gooseus · on Nov 6, 2023

Yes, I do assume that any AGI that is "a lot smarter than us" such that it is "game over" will have to possess human-like reasoning that would also allow it to adjust its own goals, otherwise it's going to be restricted in a way that makes it less capable than us.

It seems to me that you want to have it both ways... a machine that is so smart and strategic that there is no way any human intelligence could ever outsmart or outplay it; but also so narrow and limited in how its goals are defined that it is incapable of adjusting its own course of action.

I'll be honest, I don't understand how anyone who builds real machines in real life can seriously consider a machine built with such a narrowly defined goal (produce paper-clips) that also possesses the kind of capabilities you're imagining as a side effect (a lot smarter than us, game over).

I find a lot of these AI concepts lack a rigor that is commonplace in normal computer science... for instance, you can show that one problem can be reduced to another problem which has been proven to have certain limits, therefore the first problem can not break those limits without breaking the proof [comparison sorting will always be O(nlogn)].

When it comes to the "alignment problem" as it pertains to AI that have human-level capabilities, it seems to me that you have a similar situation... doesn't this problem just reduce to the same ethical, moral, and philosophical issues we have in aligning human intelligence with some cultural or civic ideal for behavior? Isn't the "alignment problem" the same problem as "how do you raise a good citizen"?

homefree · on Nov 6, 2023

There's some overlap with 'how you raise a good citizen', but we're also aligned somewhat already by our shared evolutionary history (and even then there are still major problems so if anything that suggests a lot of caution).

> "I'll be honest, I don't understand how anyone who builds real machines in real life can seriously consider a machine built with such a narrowly defined goal (produce paper-clips) that also possesses the kind of capabilities you're imagining as a side effect (a lot smarter than us, game over)."

The specific bit of this was that paperclips just happen to satisfy its reward function really well (vs. it being narrowly constrained to make paperclips intentionally). The example is meant to be about how you can get an unanticipated result when you don't know what the intelligence is solving for. A human looks at that and thinks 'that's a dumb goal', but the point of the alignment problem is that that isn't some universal truth, a different intelligence would not get that shared understanding for free. Humans have a lot of baked in baseline wants and even then like you said we're ourselves not perfectly aligned.

> "It seems to me that you want to have it both ways... a machine that is so smart and strategic that there is no way any human intelligence could ever outsmart or outplay it"

Imagine a chimp or ant trying to outsmart or out play a human - and this difference is smaller than the difference we're talking about here. To simplify it further imagine a regular human brain unconstrained by biological energy limitations scaled up to run a billion times faster. It thinks faster than you - there isn't a competition there you're essentially standing still.