Demystifying the i-Device NVMe NAND

mi100hael · on Nov 17, 2016

    > In order to read the NVMe, I therefor developped a PCIe card with a Zero
    > Insertion Force reader. I brought the JTAG part to 20pin header. The hard
    > pard in here is the signal integrity of the differential pairs. In order
    > to do so, I had to use multi layer PCB, and have the impedence match by
    > knowing the stackup, materials used for prepeg and so on..

Posts like this are very humbling. They serve as a good reminder that no matter how far I've come and how much I've learned, there will always be someone out there who knows vastly more than me like the back of their hand.

akuma73 · on Nov 17, 2016

Don't feel too bad.

Fully understanding the complete stack of a modern computer system is outside the scope of almost everyone. These things are complicated and we've built abstractions, interfaces and modules to manage the complexity.

People specialize in their own fields. I am trained in digital integrated circuit design, but don't ask me to build a file system.

djsumdog · on Nov 17, 2016

Exactly. This is super impressive work, but also very specialized. Even for non-EE majors, most IT people could at least understand the basics of what he was doing.

pjc50 · on Nov 17, 2016

This side of the business is actually pretty straightforward to learn and be guided by your PCB assembly house and design software. Ask the PCB fab for the stackup, punch 90 ohms into the calculator, and get a number in mils for the PCB software.

And the dirty secret is that for short runs, testbeds and hacking tools you can often cheat on the controlled impedance a bit to produce abominations like USB-over-FFC and three-ended ethernet cables.

Now, hot air rework, that's a serious manual skill that I respect.

(I suppose the difference versus learning software development work is that failures are expensive...)

erichocean · on Nov 17, 2016

> Ask the PCB fab for the stackup, punch 90 ohms into the calculator, and get a number in mils for the PCB software.

This sounds like gibberish to me.

(I know it's not, just that the notion that it's "straightforward to get into" is maybe off the mark.)

rfrank · on Nov 17, 2016

The stackup is what controls the layers of a bare PCB; signal, power, ground, etc [1]. An engineer at your board supplier is likely involved in determining the stackup with the EE and PCB designer. The stackup and material selection of the bare board is one of the things that controls min/max trace width, proximity, etc. Once you have that info, you can plug it into your CAD tool of choice, for instance in Mentor Xpedition it's CES [2]. Makes things like differential pairs a lot easier.

1. http://blog.optimumdesign.com/hdi-layer-stackups-for-large-d...

2. https://www.mentor.com/pcb/xpedition/constraint-manager/

jdietrich · on Nov 17, 2016

Good SMT rework is 60% flux, 20% isopropyl alcohol, 10% magnification and 10% practice. The trick is surface tension - if the surfaces are properly wetted, surface tension pulls the part into alignment as if by magic. Achieving good wetting is as simple as getting the solderable surfaces spotlessly clean and deoxidized, then using copious amounts of flux. If your preparation is good, the parts almost solder themselves.

Fine-pitch BGA rework can be genuinely challenging, but the core skills are remarkably straightforward. A complete novice can rework QFNs and 0603 passives with very little practice if they're taught the correct techniques. I actively prefer working with SMT over through-hole.

Kliment · on Nov 18, 2016

I teach a course (at various hacking events) titled "Surface mount electronics assembly for terrified beginners" and the thing that basically every participant reports after is that they expected it to be harder. We go from "never knowingly touched a PCB" down to 0.5mm pitch QFNs and 0402 passives in the space of an hour or so, so what you're saying about novices is absolutely correct. (if you want a course like that near you, contact me)

setq · on Nov 17, 2016

All these things are just different worlds. I'm the same when it comes to medicine or tree surgery or a multitude of different things. I have no idea how a security guard even works!

On this subject though, I did EE in the past and now software and it's roughly as complicated as throwing some code together after a few years of practice. Except that the design rules, physics and CAD software are perhaps ironically somewhat better defined than what the software industry has managed! Many an EE has looked in awe at the software that we write too.

stavros · on Nov 17, 2016

It's not even that hard, I managed to design a simple PCB after only a few months of tinkering with hardware. It's really not bad, and very fun. I recommend it to everyone.

TTPrograms · on Nov 17, 2016

I enjoy PCB design too - through it results in me spending an inordinate amount of time tweaking trace placement and ground planes for mostly aesthetic purposes :)

stavros · on Nov 18, 2016

That is 90% of the fun :P

setq · on Nov 17, 2016

Recommended too.

For one-offs you can entirely forgo the PCB design if you so desire as well and just use the PCB stock and rats nest it.

One of my creations: http://imgur.com/mcS79lU - fugly but very robust, functional, free of nasty parasitics and from paper to powered up and working is around 20 minutes work.

dom0 · on Nov 17, 2016

> I have no idea how a security guard even works!

Hm. Can someone shed some light on this?

alexbeloi · on Nov 17, 2016

I bet he plays jazz piano too.

revelation · on Nov 17, 2016

That's the bread and butter of digital EEs. In fact a digital EE would be very happy if all he had to do was a 4 layer board with a single 70 ball chip :)

dom0 · on Nov 17, 2016

The layout here shouldn't be difficult at all - but the other feats are quite impressive, still :)

honkhonkpants · on Nov 17, 2016

Some big words mixed in there, but if you try your hand at this I think you'd find that making impedance-controlled differential traces on a PCB isn't much of a trick at all. It's a difficult engineering challenge if you intend to mass produce, but it's not a challenge if you intend to make 10 boards and you aren't cost sensitive and you don't care if 9 boards don't work.

cushychicken · on Nov 17, 2016

I help design PCIe cards and interfaces, and this guy's work is blowing my fucking mind.

sounds · on Nov 17, 2016

The gold is at the bottom:

  The idea here would be to see if it was possible to control the NVMe
  over jtag in order to ask it to perform a DMA read over the PCIe Bus.
  In order to do so, the PCI_COMMAND_BUS_MASTER has to be set to 1. We
  can assume that since the chip is using remote RAM, it is allowed to
  act as a master over PCIe. Here is a snippet of the probing function
  of the kernel driver.

(code)

  Our goal here is to force the DMA to happen just by controlling the
  ARM of the NVMe over JTAG, in order to ask it to dump the region we
  alloc'd in kernel and see if we get the data out of it.

In other words, full root exploit of the phone from the NVMe JTAG pins.

josephg · on Nov 17, 2016

> In other words, full root exploit of the phone from the NVMe JTAG pins.

Sure, except the data on the device will be encrypted. The keys are kept in the iphone's tamper resistant security enclave. Android and iOS both have full disk encryption enabled out of the box these days. (On iOS I don't think you can disable it.)

honkhonkpants · on Nov 17, 2016

That isn't the point. This exploits the host, which holds plaintext data in memory.

AceJohnny2 · on Nov 17, 2016

No, it doesn't exploit the host. The "Host" in the case of such an NAND device is the Baseband SoC, not the ARM controller on the NAND device. You can be pretty sure that Apple did not include decryption on the NAND chip itself. That would defeat the point.

honkhonkpants · on Nov 17, 2016

That isn't what he's describing. He is commanding the storage controller to initiate DMA from host memory over the PCI-Express bus.

AceJohnny2 · on Nov 17, 2016

Ah good point.

However, he does this by using the NAND as a PCIe master, which implies that the peer (main SoC) would be switched to device mode, and the whole PCIe handshake happens properly.

While I can't dismiss that possibility entirely, I'm not holding my breath for it to work out.

cnvogel · on Nov 17, 2016

No, you don't have to switch "Master" and "Device" mode (what does that even mean in PCIe?).

You might be surprised to hear that your networking card, the AHCI-SATA interface and even the sound-card soldered onto your computer's mainboard is allowed to, and does, read from and write to your computers memory whenever you are using it. {that's just where I had to dig in, in the past, almost any modern peripheral uses DMA}

If the operating system needs a block from your harddisk to be read, and stored in a certain memory location it will actually tell the physical memory location to the AHCI controller and tell it to write the contents into memory, on it's own. It will signal an interrupt when it has finished transfering the data.

Your networking card will have "ring descriptors" which tell the card a list of buffers (in RAM) into which to save new incoming packets. It will raise an interrupt after it has finished writing to the first buffer.

Your sound-card will likewise have a list of buffers from which it reads, and to which it writes just as you are listening to mp3s, or having a phone conversation with google hangouts.

The worst offender would probably be Firewire which, if not configured to block this (modern OSes do), allows these reads from/writes to memory triggered by any connected external device, and from/to addresses supplied by this external device. It can be a huge help for debugging, though.

On typical computers, there's no safeguard against this happening. But some have a IOMMU which can be used to limit DMA and redirect DMA accesses with respect to physical memory.

johncolanduoni · on Nov 17, 2016

To what degree can misbehaving PCIe devices (as in the device as a whole being malicious, not simply manipulated) override IOMMU protections? Can they pretend to be another device or otherwise send unexpected signals to confuse this defense?

honkhonkpants · on Nov 17, 2016

Violating the timing specs for PCI signalling would put the whole system into an undefined state. It's possible you could glitch the thing on the other end into doing whatever you want. Who knows?

wtallis · on Nov 18, 2016

PCIe devices can identify themselves to the host with whatever vendor and device IDs they want.

Since PCIe is a point to point link rather than a shared bus, it is not really possible for a malicious peripheral to pretend to be connected through a different port. PCIe switches may weaken this security somewhat.

AstralStorm · on Nov 17, 2016

It is a similar vector of attack as with root achieving pendrives, except PCIe devices are even less verified.

TCB verification in TPM and firmware to verify the code running on the microcontroller is required to prevent such an attack.

qb45 · on Nov 17, 2016

No, root pendrives must exploit bugs in USB/block/fs/UI subsystems of the OS. PCIe devices can read/write RAM directly as they please unless isolated behind IOMMU.

tmzt · on Nov 18, 2016

Unless they are USB-C devices and the host supports Thunderbolt 3 without an IOMMU. An innoculous looking device could perform a DMA exploit.

revelation · on Nov 17, 2016

The JTAG angle is unnecessary, and difficult to do in practice with this LGA70 chip face-down soldered onto the logic board.

It really means there is a Cortex-A (so lots of brunt) with a firmware update mechanism that has 1) direct access to the application processor RAM and 2) direct access to the plentiful permanent storage.

huslage · on Nov 17, 2016

He made a board so that he can do JTAG in-line. Not sure how else he could tell the processor to do something.

AstralStorm · on Nov 17, 2016

By flashing custom unverified firmware, of course.

CountSessine · on Nov 18, 2016

Flashing custom unverified but somehow-signed-with-Apple's-secret-key firmware?

walterbell · on Nov 17, 2016

Does the phone's ARM CPU have an IOMMU?

huslage · on Nov 17, 2016

kanwisher · on Nov 17, 2016

Refreshing to see a deep tech article on HN. I really liked how he debugged the code on the controller

iuuuuu145 · on Nov 17, 2016

>It looks like to reduce the size needed, the NVMe core uses the host DDR in order to work. Therefor, apple is not strictly following the specification regarding the initialisation.

Yikes.

djsumdog · on Nov 17, 2016

Doesn't surprise me; Apple for not doing anything by standards. When you control all the hardware, I guess it doesn't matter. You can have broken ACPI and driver implementations all over the place.

I've started to give up on ARM embedded boards for the same reason. You need to build out images for all the different potential ARM systems if they don't use device trees. There are Intel Atom/AMD Geode Pi/Beagelboard clones that will boot up just like a desktop and you can install most Linux distributions right on them without modification.

If ARM ever decides to start selling an architecture spec instead of just a SoC spec, I think it would go a long way at making it a better platform. I'm pretty sure Apple would still ignore it f

StillBored · on Nov 17, 2016

Its happening (slowly) SBSA/SBBR/etc are out there and being implemented. Its just taking a while for EDK2/linux/etc to get to the point that it can run on a random ARM SOC. Once that happens though, suddenly you have a machine that can install off the shelf linux distros. Right now, There are only a couple systems that work like you would expect (mostly AMD Seattle machines, although there are a couple others) but over the next few years I would expect that the firmware guys to start expanding the device compatibility.

Eventually (particularly with ACPI) it will become apparent to the ARM ecosystem, that they can make their devices compatible with the standards, or they can spend millions on engineering effort to build their own firmware/OS stacks that will perpetually be behind the capabilities of the rest of the ecosystem.

codebook · on Nov 17, 2016

NVMe added feature of HMB (Host Memory Buffer) to use host DDR rather than internal RAM. So, I think this chip used HMB feature.

drv · on Nov 17, 2016

It's presumably not standard Host Memory Buffer, since the spec says "The controller shall function properly without host memory resources."

misnome · on Nov 17, 2016

It's not as though they are manufacturing these for anybody else to use.

nimish · on Nov 17, 2016

Apple's purchase of Anobit is paying dividends!

digi_owl · on Nov 17, 2016

Sometimes i wonder how many companies Apple has bought that now simply exist to make parts of Apple products.

threeseed · on Nov 17, 2016

Somewhere in the vicinity of 20 companies.

What's interesting is that about a third of those companies we have yet to see the output of. Expect Apple to get into AR/VR in a big way in 2017/2018.

mmastrac · on Nov 17, 2016

Has anyone managed to capture the text of this article? It doesn't appear to be in a Google cache AFAICT.

arm · on Nov 17, 2016

https://archive.is/FGIiC

condescendence · on Nov 18, 2016

Definitely one of the cooler and more in depth posts this year, what a great read.

athiercelin · on Nov 17, 2016

Very good stuff!