Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ÆPIC Leak: Architecturally leaking uninitialized data from the microarchitecture [pdf] (aepicleak.com)
118 points by triska on Aug 9, 2022 | hide | past | favorite | 17 comments


This is super neat.

The researchers built an informal taxonomy of all the CPU bugs we've seen in the past several years, using the CWE (bleh) as a reference. They noticed that anywhere they found a transient execution vulnerability (like a cache side channel) for a given CWE, they'd also find an architectural vulnerability (one exposed directly through the ISA) --- except for CWE-665, Improper Initialization, where their survey only identified transient attacks.

Working from a hypothesis that these kinds of attacks always come in pairs, they set out to find a CWE-665 vulnerability. They did something clever: they build a scanner. From one hardware thread, they kept known data patterns cached. From another, they used ring 0 code to systematically map and read the I/O address space; they could check all those reads for canary values from the first grooming process to detect leaks.

And they found one! The APIC interface (which routes interrupts) exposes successive 32 bit values (sometimes as 4-byte blocks of 8- or 32- byte data structure), which are always aligned on 128-bit boundaries; in other words, each 4-byte APIC value is embedded in a range of addresses 16-bytes wide. Reads past 4 bytes in those ranges are undefined. When their scanner read them, they caught canary values their grooming process wrote.

The theory here is that the cache system in these CPUs, which is organized around queues of buffers (to asynchronously handle loads from L2 cache, with line fill buffers, and the LL cache, with fill buffers in the "superqueue", is also used by the APIC system to satisfy reads: reads from the APIC are satisfied through the superqueue. Their grooming process is filling the whole superqueue up with canary values, and the APIC reads are failing to clear out the superqueue buffers when using them for APIC reads.

You can only launch this attack from ring 0 (you need access to physical memory). But that's enough to fatally break SGX, whose whole purpose is running compute on CPUs that don't trust their ring0.

Looks like it only works on Sunny Cove Intel.


I've been a long time away from CPU architecture, but isn't it time that instructions are added that target the caches so that they only fetch the size of the data allocated? That way there's no leaking of errant memory? Or did I completely misunderstand the problem set (which is likely)?

So far in our languages we have two bits of information, the start pointer and the size (the latter being stored as a variable, or intrinsically as a block size), whereas the OS/framework itself often only needs the pointer...

If there was a system that allowed the tightly coupled block of memory to be represented as an op code for the caches, wouldn't that fix up all of this?


An instruction wouldn't fix that, right? The cacheline fetched by the memory controller would still contain leaked data. Perhaps a fill cacheline with zero on invalidate...


How many times has SGX been broken now?

When (if ever) could a reasonable person think "ok, now they must have got rid of all the vulnerabilities, time to trust this!" ?

...and then Intel will add another new architectural feature that will interact with SGX in some unforeseen way and break it yet again. SGX is surely just too fragile?


> How many times has SGX been broken now?

Quite a few. L1TF/Foreshadow was pretty catastrophic, and Plundervolt was just funny. In name and execution. Plus various others.

> When (if ever) could a reasonable person think "ok, now they must have got rid of all the vulnerabilities, time to trust this!" ?

Never.

Nor should one trust Intel chips for sensitive computations, given how badly they leak and how badly Intel seems to be at reasoning about this. I'm getting rid of the last of my Intel systems this month (beyond random compute nodes for BOINC stuff I run them with 'mitigations=off' for max performance because they have literally nothing at all sensitive on them, not even passwords - I use different passwords for them).


SGX’s per-chip private keys are the “flag” in Intel’s unintentional CPU vulnerability CTF.

Those keys have been popped so many times.

Reading the original they explicitly state that side channels are out of scope for their threat model. As a result, designed a system that was essentially maximally vulnerable to side-channels (and other architectural issues). It’s really hard to trust SGX as a result. Otoh, if I were in a bind and needed to do trusted computation on an untrusted host, are there really other options that don’t suffer similar (or worse) issues?


Not really (at least not on 'commodity' CPUs). It is still an active (and very interesting) area of research. There is a lot of work going on into both attacks and practical defenses for this kind of stuff. Whether it is truly solvable in practical settings remains unclear. I do believe it raises the bar enough for it to be useful in many scenarios though.


>How many times has SGX been broken now?

So frequently that I believe it's deprecated nowadays.


Not true, it is still being pushed for datacenter applications/servers (e.g. where you have an 'untrusted' cloud provider).


Partly true, it is deprecated in client devices https://www.bleepingcomputer.com/news/security/new-intel-chi...


Quoting from the accompanying repository https://github.com/IAIK/AEPIC :

"AEPIC Leak is the first architectural CPU bug that leaks stale data from the microarchitecture without using a side channel. It architecturally leaks stale data incorrectly returned by reading undefined APIC-register ranges."


This behaviour sounds similar to reading from what is traditionally called an "open" or "floating" bus.

Our end-to-end attack extracts AES-NI, RSA, and even the Intel SGX attestation keysfrom enclaves within a few seconds.

Good. May all attempts at DRM and user-hostility die in the same manner. You're not the true owner if you can't inspect and modify the entire state of the machine. And fuck remote attestation too -- the true enemy of your freedom.


It bugs me that they classify Alder Lake as being Sunny Cove. It is not. The code name for Alder Lake is Golden Cove / Gracemont for performance/efficiency cores.

In fact it is quite strange that the attack skips Tiger Lake (Willow Cove), which changes almost nothing from Sunny Cove besides L2 and L3 cache sizes, but shows up again in Alder Lake (which has two types of cores: does the attack work on both efficiency and performance cores?)


> which changes almost nothing from Sunny Cove besides L2 and L3 cache sizes

AFAICT the attack targets buffers that go between L2 and L3, so it is isn't surprising to me that it just happens to not work with a slightly different cache geometry.


Golden Cove has the same sizes as Willow Cove, though different geometry (10 vs 20-way L2, for example). However, Golden Cove's L3 has incredibly high latency compared to predecessors, which might be what makes it work by forcing data to stay in the superqueue longer.


Oh, that's interesting (well, to a dummy like me); it suggests that the logic could be comparably broken, but we just can't observe it to be exploitable because the latency masks it off.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: