Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is another technical angle; RC4 is usually quite a lot less CPU intensive than the alternatives available. Not using RC4 can easily mean stuttering video playback, greatly diminished battery life, and even lock ups. Very few users are open to accepting that issues like that are "better" for them.

Many RC4 deprecation efforts have faced rollback in the face of issues like this; especially on hard to fix embedded devices (think TVs, Cars and phones) with comparatively weak CPUs.



There are two solutions: use hardware with the AES-NI instruction set, which makes AES blazing fast, or alternatively use a better stream cipher like Salsa20. On my machine, which has an Intel i5-3570k, Salsa20 is about 25% faster (edit: than RC4)

Unfortunately, neither solution is easy: only the very latest chips have AES-NI instructions, and not many clients support Salsa20 yet (OpenSSL does not, for example, and it powers a lot of SSL stuff).


Does any TLS stack support Salsa20? I know Adam Langley has a draft for ChaCha+Poly1305, but that's not Salsa20.

Either way, I don't think Salsa20 is a realistic suggestion for improving TLS performance.


Yes. GnuTLS does: http://www.gnutls.org/manual/gnutls.html#Encryption-algorith...

Anyway, TLS is in a tough spot. It's such a widely adopted standard, with so many implementations, that making radical changes is exceptionally difficult. AES-NI leaves the standard mostly alone but requires new(ish) hardware, but on the other hand, implementing newer, faster primitives (like Salsa20) requires essentially turning the massive boat that is TLS.

There are no easy solutions, at least as far as I can see.


That's true of all new ciphersuite proposals, isn't it? The Salsa20+Poly1305 proposal just replaces AES, CTR, and GHASH with Salsa20 and Poly1305.

The problem is getting the installed base up to TLS 1.2.


Oh, yes - definitely. I just picked Salsa20 as an example because I already had benchmark data for my machine, and I am familiar with it.

But even TLS 1.2 won't help because 1.2 doesn't include ciphers that are screaming-fast without hardware-acceleration. AES-GCM is faster than AES-128-CBC/HMAC-SHA1, but Salsa20-256/HMAC-SHA1 is still twice as fast on my machine. Now if the AES-NI instruction set is available, then AES-GCM handily beats everything by a large margin. (Of course, using hardware acceleration, AES-128-CBC/HMAC-SHA1 is marginally faster than Salsa20-256/HMAC-SHA1, again on my machine.)

The ultimate point is that, without the AES-NI instruction set, new ciphers are just about the only way to get really good TLS performance.


Does AES-GCM with AES-NI and PCLMULQDQ beat Salsa20+Poly1305 with lots of sessions? I know it's got excellent cycles/byte for a single session, but TLS implementations also need agility.


I'm afraid you've exhausted the limits of my precomputed benchmarks. :)

I don't know the answer offhand, but I would suspect that hardware-accelerated AES-GCM would win. It certainly does in single-threaded, "one-session"-esque tests, and the margin of its victory makes me think that hardware-accelerated GCM would be hard to beat by anything.

On my machine, a single thread/core running nothing but AES-GCM can encrypt/decrypt 8192 byte blocks of data at 1.32 GiB/s (this is using OpenSSL's benchmarking feature). Yes, that's gigabytes, not gigabits. It's literally faster than IO for my SSD. (Salsa20, without a MAC, can do the same at about 0.64 GiB/s.)

When I told OpenSSL to use four threads in parallel, it ranked at 5.01 GiB/s, which is absolutely crazy.

That said, beyond a general leaning towards AES-GCM (simply because it is so fast with hardware acceleration), I don't have any hard data on which would be the victor. But I may just construct some benchmarks to test that out, because it's an interesting question.


(disclaimer, I'm one of the Salsa20 in TLS draft authors).

Note that the suggestion of using Salsa20 is to replace RC4 not only to get better performance, but because RC4 is broken (as you know).

Salsa20 (and ChaCha) can be implemented on constrained devices and reach RC4 like performance. On modern architectures the algorithms word based functionality better utilise the HW than RC4 and can reach better performance.

Yes, AES with HW-support such as AES-NI can provide really good performance too. But then we _only_ have AES (and DES/3DES). Do we want to reduce SSL, TLS to a single symmetric encryption primitive? And no stream ciphers?


There is at least an RFC out for a TLS stream cipher using Salsa20, if I'm remembering current events correctly. Of course publication of a spec will precede implementation in hardware and software by some years, I would imagine.



Google's Adam Langley has just proposed ChaCha20 with Poly1305 for TLS, and says the performance is pretty good (~5x faster than AES-GCM in software).

https://www.imperialviolet.org/2013/10/07/chacha20.html


Actually, RC4 is pretty memory hungry (with a state of 256 bytes) and performs a lot of read operations that hit main store in small devices.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: