Hacker Newsnew | past | comments | ask | show | jobs | submit | ohmygodel's commentslogin

Running a hosting server for onion services, as was done in this case, is a terrible idea. It greatly increases the risk of deanonymization. The question is less how this hosting service was discovered and more how it ever stayed up long enough to become so notorious. Here's why:

1. Each hidden service chooses a "guard" relay to serve as the first hop for all connections.

2. A server running multiple hidden services has a guard for each of them. Each new guard is another chance to choose a guard run by the adversary.

3. An adversary running a fraction p of the guards (by bandwidth) has a probability p of being chosen by a given hidden service. A hosting service with k hidden services is exposed to k guards and thus has ~kp probability of chosen an adversary's guard. With, say, 50 hidden services, an adversary with only 2% of guards has nearly 100% chance of being chosen by one of those 50 hidden services.

4. The adversary can tell when it is chosen as a guard by connecting to the hidden service as a client and looking for a circuit with the same pattern of communication as observed at the client. Bauer at el. [0] showed a long time ago this worked even using only the circuit construction times.

5. The adversary's guard can observe the hidden service's IP directly.

The risk of deanonymization with onion services in general (i.e. even not using an onion hosting service) is significant against an adversary with some resources and time. Getting 1% of guard bandwidth probably costs <$500/month using IP transit providers (e.g. relay 8ac97a37 currently has 0.3% guard probability with only ~750Mbps [1]). And every month or so a new guard is chosen, yielding another chance to choose an adversarial guard. Not to mention the risk of choosing a guard that isn't inherently malicious but is subject to legal compulsion in a given jurisdiction (discovering the guard of a hidden service has always been and remains quite feasible with little time or money, as demonstrated by Øverlier and Syverson [2]).

[0] "Low-Resource Routing Attacks Against Tor" by Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, and Douglas Sicker. In the Proceedings of the Workshop on Privacy in the Electronic Society (WPES 2007), Washington, DC, USA, October 2007.

[1] <https://metrics.torproject.org/rs.html#details/014E24C0CD21D...

[2] "Locating Hidden Servers" by Lasse Øverlier and Paul Syverson. In the Proceedings of the 2006 IEEE Symposium on Security and Privacy, May 2006.


This is some great info for the less technically knowledgeable about Tor (like me!). However, I think your math in #3 is wrong.

Assuming random assignment/selection of the guards, each time one is chosen it has a 98% chance of not being "caught" by choosing an adversary's guard. Going with 50 services as you said would be .98^50=.364, meaning the chance of getting caught is 1-.364=.635 - 63.5%. This is vastly different than being nearly 100%.


Fair enough! I was using as a heuristic the expected number of compromised guards, which would be 0.02*50 = 1. Moreover, things degrade exponentially over time. If half the guards rotate every month, the chance of choosing a bad guard is after 2 months is >86%, after 4 months is >95%, after 6 months is >98%.


There was a posts week or two ago from a person running a legit Tor service that was analyzing all of the attacks he received.

He said something seemed to be dos'ing the guard nodes, causing his service to automatically choose a new guard, in an attempt to get his service to connect to a guard node controlled by the adversary. He said in one case, they found his server's actual IP address and dos'd it.

Could that be what happened?


I assume you refer to [0]. He says "If [the adversary] can knock me off enough guards, my tor daemon will eventually choose one of his guards. Then he can identify my actual network address and directly attack my server. (This happened to me once.)" I question how the author is sure this is what happened to him. But he may be right, and moreover that attack may have been performed against the "dark web tycoon" that is the subject of this post. However, it does seem to be somewhat challenging to perform, as Tor keeps trying to use all recent guards ever contacted, and so you'd have to simultaneously make all chosen guards unresponsive until a malicious guard is selected.

[0] http://www.hackerfactor.com/blog/index.php?/archives/868-Dea...


> 5. The adversary's guard can observe the hidden service's IP directly.

So does the guard know that it is a guard and that the traffic comes from a hidden service? I thought Tor worked by jumping from node to node, and that each node didn't know whether the traffic came from the original client/service or from another node in the chain. So each time you make a connection over Tor you're essentially telling a guard node "here's my real IP, send this traffic to this hidden service and return the response please" and you have to trust that they keep it a secret? I feel like I'm missing something here.


The Tor protocol doesn't explicitly signal the guard relay that it is in the guard position. However, the guard relay (call it R) can use several indicators to conclude that the preceding hop (call it S) is indeed the source (e.g. the onion service):

1. S is at an IP address that is not a public Tor relay as listed in the Tor consensus. It's not impossible that S is a bridge (i.e. private Tor relay), but statistically unlikely because using a bridge isn't all that common.

2. During circuit construction, S extends the circuit beyond R two times. I don't see why Tor couldn't easily create dummy circuit extensions to fool R, but it doesn't (probably because there are so many other indicators that this change alone wouldn't solve the problem).

3. R observes what appear to be HTTP-level request-response pairs between it and S at about the same round-trip time (RTT) as the RTT R observes between it and S at the TCP layer, which should only happen if there were no more hops beyond S.

If I recall correctly, Kwon et al. [0] describe several more statistical indicators of being a guard for an onion service.

Also, you are right that a client doesn't tell the guard node the destination (e.g. the onion service) of its traffic. The guard node is not trusted with that because it already directly observes the client, and so giving it the other side would deanonymize the connection.

[0] https://www.usenix.org/conference/usenixsecurity15/technical...


I always assumed the issue was not just finding the servers, but that they are often in countries that are hostile to US law enforcement.

You can do fancy attacks all you want, if the server is in Russia they're probably not going to be honoring any MLATs


Wasn't this 2013??

Its 2020 now so much has to have changed. Tor sucked 7 years ago.


Tor has made some improvements that would reduce the threat of deanonymizing an onion service, but none affect the above analysis (or rather, the above analysis has taken them into account). The main improvements, in my opinion, have been:

1. The biggest improvement is that (in 2014 or 2015?) they reduced the number of entry guards from 3 to 1 [0], reducing the risk of a malicious guard by a factor of 3.

2. The time until a guard choice expires was increased from 2–3 months to 3–4 [1] (this maybe happened 3 years ago?). This increases by ~40% the expected time an adversary would need to passively wait to have his relay selected as a guard by a victim.

3. The bandwidth threshold to become a guard relay was raised from 250KB/s to 2000KB/s [2] (looks like in 2014). However, 2000KB/s=16Mbit/s is still a very low bar, and, moreover, for an adversary that can run relays above the threshold, this change increases the adversarial guard fraction as there are fewer guards above the threshold to compete with.

4. A new guard-selection algorithm was implemented that prevents a denial-of-service attack from forcing a large number of guards (i.e. > 20) from being selected in a short period of time [3]. I believe this merged in 2017. If an adversary can force guard reselection by an attack, you are still extremely vulnerable, though, as a limit of 20 still provides a 20x risk multiple.

[0] https://trac.torproject.org/projects/tor/ticket/12688

[1] https://trac.torproject.org/projects/tor/ticket/8240

[2] https://trac.torproject.org/projects/tor/ticket/12690

[3] https://trac.torproject.org/projects/tor/ticket/19877


These are well known attacks. In case of Freedom Hosting this maybe was the cause for finding the server. Mitigation exists. Today big illegal darknet websites run lots of Tor servers on their own. You can also manually set trusted guards or other nodes in the chain so no malicious node will ever be part of your path through the network.


Yes, if you manually and wisely choose your own guard nodes, then you can avoid these attacks. You should be sure that those guards can't themselves be linked to you, either.


Interesting. Looking for more info on what you were talking about (with regard to "guards"), I dug up this post[1] which has some info too.

[1]: https://blog.torproject.org/announcing-vanguards-add-onion-s...


This is probably the best description of how Tor uses guards: https://gitweb.torproject.org/torspec.git/tree/guard-spec.tx....


The page you link describes "vanguards" which apply the guard logic to positions beyond the first hop. They are only available as a plug-in that you must separately download and configure. My understanding is that no plans currently exist to integrate vanguards into Tor due to cost of engineering challenges that appear if everybody were to use them (including especially how they would affect load balancing).


Thanks for the follow up info and additional explanation!


That only leads you to the server though, not to the person managing it.


In this case, the main question is how the server was discovered, not how the operator was then deanonymized. As the article describes, after the server was discovered to be in France and run by OVH, authorities used legal treaties ("MLATs") to obtain the subscriber information, leading them to the person that recently plead guilty in court.


This seems incredibly naive. Who would register a VPS hosting different kinds of the most illegal content imaginable using their real name or IP address? Even if they thought hidden services were impenetrable, there are always other possible slip-ups you could make which could disclose the server's real IP, and of course they'd be ignorant to think any security measure is impenetrable, including Tor.

DPR made extremely careless mistakes, too, to the point that even a random amateur investigator could've identified him, using only Google.

It's shocking how many of these people aren't caught sooner when they don't even know OPSEC 101.


To people who were paying attention to the wishful thinking at the time about tor's security guarantees, it doesn't seem so incredible.


Sure, but even if you assumed Tor was perfectly secure, there are still other ways of being exposed (like someone causing your web server to issue a network request to a host they control).

No matter one's assumptions, it makes no sense to me that someone would register a VPS with their own information when it's pretty trivial to do so anonymously. Especially if you're running an illegal content hosting empire.

DPR's mistakes at least made sense to me; they're something anyone could have overlooked, even if they were still very naive mistakes. But I doubt DPR used his personal information when paying for servers. That's well beyond "unrealized mistake" into pure incomprehensibility.


They supposedly caught on to him by connecting an email address associated with DPR to his real-world identity. Wouldn't surprise me if that was an ex post facto lie concocted to conceal the true method, though.


But that's all they need though.

A simple national security letter (NSL) without even needing to get a warrant and BOOM you can tap the server and get all info about the person running it.


Not if the server is paid for anonymously and you only connect to it over tor. That connection isn't through a hidden service and so isn't vulnerable to this attack.


A national security letter can not compel someone to tap a server for the government or allow the government to tap a server. A NSL can only request existing collected records. So for exampe a NSL could request any logs a service provider has regarding who paid for the server or any access logs they retain regarding the server. If they do not have any logs a NSL can't compel them to start collecting them. A NSL which requests actions or information outside of the scope allowed by law can be challenged in court.


That's a very good explanation!


Saving this answer, thanks!


I support Brave's vision for the Web, but it currently seems to represent a step backwards for privacy. Making payments to providers essentially involves sending your Web browsing history to Brave. The FAQ states that "we do not know which BAT wallet is associated with the lists of sites that you choose to support". I believe that is false.

I think it works like this: (1) Brave Browser submits its transactions to a Brave server to exchange a BAT for an Anonize ballot (anonize.org), (2) each ballot has the name of a site you visited randomly added by the browser with probability proportional to the frequency of site visits, and (3) the ballots are sent to a Brave server. Key here is that the token and ballot submissions are sent directly (e.g. not through a proxy or Tor). In addition, I believe the ballots may be submitted as a batch (i.e. at one point in time). Therefore, it is easy for Brave to see your votes for your visited websites, all coming at once, all from your IP address. That IP address may well be the same one used to exchange the BAT for ballots as well.

There are additional problems regarding visits to unusual and identifying websites that I feel like Brave hasn't begun to consider, either. Suppose that every and only time that Brave receives a ballot for your personal website, they also receive a ballot for some unpopular and sensitive website. They can then conclude that the owner of the website also visits that sensitive site.

These problems must be addressed before Brave can be considered seriously by privacy-conscious users.


No history sent to Brave - did you assume this, or read it somewhere?

We use ANONIZE2 based on https://anonize.org/ to blind ourselves to your history. Can’t be evil > Don’t be evil. We see only zero-knowledge proofs that say how many votes go to sites or YouTube or Twitch accounts. These proofs do not link to user id or to ine another (so no fingerprint by clustering). They go over an IP address masking service to our accounting server, while your monthly budget goes in a single token transaction.

Note Google and other ad tech powers do track your history. Logging into Chrome even gives your history over for ad targeting. Blendle, Flattrplus, other such services also see your history. But we do not.


I understand that Anonize is used for anonymous ballots. I understand that Brave used to submit its ballots via a single-hop proxy. My understanding is now that Brave no longer uses this proxy, which wasn't a good solution anyway because the proxy sees the user's entire set of ballots (aka browsing history). Thus Brave is now given all the ballots directly from the user, and thereby learns the user's browsing history. I do agree that other browsers and services also track users around the Web. Eliminating that is a goal that I support and that I think Brave does as well. I think that it is failing to achieve that goal. Either you don't realize the technical reality of your solution, or you are being misleading.


No, we do not see any user id. IP address we do see for any “tokens sent to user wallet” cases, for antifraud and per terms & privacy policy, but that is not a useful id and (more important) we do not use it for other purposes per GDPR. See GDPR’s “purpose limitation”. We would face 4% of global revenue fine if we violated this, and we are holding FB, G, and others to same standard.

For IP masking in the case where you buy your own tokens, we have two options: 1/ relaying at IP level where we would not see your IP address and the partner would not see any encrypted payloads; 2/ Tor, which is already integrated. More to do but you led with “we see user history” and that is just false in all these cases. We do not see history of sites visited or supported on a linkable to user basis.


An IP address is an identifier. If it weren't, there would be much less reason to use a VPN or Tor.

Suppose I understand you correctly and you do see the network IPs and timestamps of submitted tokens and ballots. Is your argument then that you can be trusted to follow your privacy policy? If we rely on trusting you to follow policy, then why not get rid of your zero knowledge proofs entirely?

By saying that you "have two options", it sounds like you are saying that there are two mitigations for the privacy problem that you could use but do not yet.

(1) is the one-hop proxy, which used to be used in the form of Private Internet Access service, but it seems like it is not currently being used by Brave. If you did use such a service and encrypted the publisher identities under Brave's public key, then that would be a improvement, although still not really private because Brave would receive the results in a batch from Private Internet Access. Browsing histories are essentially fingerprints for each user. The ten sites I visit each week are almost certainly not shared by any other Brave user on the planet, and moreover they are frequently identifiable (consider sites for individuals, companies, sports leagues, scohols, etc.). From [0]: "Our results show that for a majority of users (69 %), the browsing history is unique and that users for whom we could detect at least four visited websites were uniquely identified by their histories in 97 % of cases."

(2) has the same batching problem as (1). It would be superior, though, because it would be harder for Brave and the proxy system to collude or (more likely) be forced to cooperate with some authority.

To handle the batching problem, you should at least choose to upload each Anonize ballot at a uniformly random time in each month and on a separate connection (i.e. TCP connection or Tor circuit). You should also explain how this works in a technical document to give people the ability to understand what exactly they are signing up for when they enable payments in Brave. Ideally you would use a cryptographic protocol more suited to strong anonymity than a proxy network, such as a verifiable mix network or a secure-multiparty-computation protocol.

[0] Olejnik et al., "On the uniqueness of Web browsing history patterns", 2014, <https://link.springer.com/article/10.1007/s12243-013-0392-5>


Please read up on GDPR "purpose limitation". We cannot use IP address except for antifraud, so it is not legally viable for us to try to link zero-knowledge proofs into a profile based on IP address. Also, my home AT&T IP address wanders often, so do many others; mobile is even more variable. But my main point here is purpose-limitation where we take IP address for antifraud. Which we must do, or our user growth pool would be quickly taken by fraudsters.

As we are all open source and will get annual audits when scaled beyond trials, I think you are mistrusting prematurely.

On linkability for users who buy their own BAT and so do not require the antifraud terms: as noted in my item 1, we are talking to PIA about using an IP relay (not full VPN). This got delayed by their work on handshake.org but we're restarting it.

Tor (item 2) is better and batching is not an issue. We do not make cross-site/channel linkable batches in any event. Each ANONIZE session paying a given domain or YouTube/Twitch account is separate from every other. Putting these through separate Tor circuits is possible, as we also randomly space them out in time.

I don't know why you are telling us to do things we already do. Did you find a bug in the open source? We pay bounties.


Separate thing, not promising it on a schedule yet so really FWIW: our BAT roadmap's "Apollo" phase aspires to decentralize as much as possible. This could certainly include p2p flows with ZKPs in state channels or better. We are looking at OMG's Plasma implementation.

So the ultimate goal is to get away from ANONIZEd traffic to a blind accounting server. But as I say, lots of problems to solve before promising this. Yet with Ethereum scaling and anonymity support, for users who buy their own BAT (where I claim your objection to IP address has most merit), we could go p2p on-chain for decentralization w/o fraud risk for bring-your-own-BAT users.


Interesting, but decentralization does not equal privacy. Indeed, it might make privacy worse by sharing the data more widely and making it even easier to get copies of the data. Consider, for example, BitTorrent, which has a pretty decentralized distribution protocol that also makes it easier for third parties like the MPAA to observe who is sending and receiving the files.

Even using a privacy-enhanced blockchain isn't necessarily sufficient. Blockchains do not provide anonymous messaging. Therefore, a recipient R of a transaction can identify the sender S if R can observe S sending the transaction. Yes, this problem affects Bolt, Zcash, Monero, etc.


Yeah, I noted blockchain issues in my latest (in time) reply. But at least you won't have IP addresses to worry about any longer :-P.


What if one were to run one of these "privacy-enhanced blockchain"s from a VPS (paid for with these same anonymized tokens)?

In case it's not clear, I'm earnestly asking this question


That's basically using a proxy, and so it has the same security. If the proxy is/geos bad (say, your VPS provider reveals your IP to some interested guys with guns), then you lose (anonymity). If your proxy remains good, then all that can be learned is the transactions originated at the proxy. However, the proxy does serve as a potential pseudonym, and so if the collection of transactions reveals something identifying, even if any one transaction doesn't, then you lose.


> Please read up on GDPR "purpose limitation".

I am reasonably familiar with the contents of GDPR, having looked into it more after attending a lecture on the subject [0].

> We cannot use IP address except for antifraud, so it is not legally viable for us to try to link zero-knowledge proofs into a profile based on IP address.

If your users must rely on you obeying a policy, then please just say that. Right now, it seems to me that you claim to use technical means to prevent Brave from learning browsing histories [1].

> my home AT&T IP address wanders often, so do many others; mobile even more variable.

IP addresses can be so identifying that they have been ruled as personally-identifiable information by the European Court of Justice [2].

> I think you are mistrusting prematurely. But as noted in my item 1, we are talking to PIA about using an IP relay (not full VPN). This got delayed by their work on handshake.org but we're restarting it.

Thank you for stating clearly that you aren't using PIA (aka "IP masking") at the moment for Brave Payments. You might consider your users who are worried about data breaches and compromised servers as much as they are worried about Brave's intentions. Please don't take my criticisms personally.

> Putting these through separate Tor circuits is possible, as we also randomly space them out in time.

Oh, you do randomly delay ballot submissions? I have not been able to find any such logic in the code but would be happy to be pointed to it. The specific way in which you choose delays is, of course, crucial to it providing security.

[0] <https://petsymposium.org/2018/program.php>

[1] <https://brave.com/faq-payments/#anonymous-contributions>

[2] <https://www.irishtimes.com/business/technology/european-cour....


> If your users must rely on you obeying a policy, then please just say that.

I did just say that, several times -- but with conditions that do not make it a matter of "policy" only. We agree IP address should be masked for self-funded users. Working on it!

We are very familiar with how any user log can be used as a history, but anonized proofs that can't link to a user id except illegally to IP address are not on the same level of threat as the Blendle, Flattrplus, etc. histories taken in the clear -- never mind Google et al. surveillance. To equate the tech and not make any distinctions does a disservice to us in my view. I'm not sure you did equate, but see next paragraph.

Tech alone is never enough for anything like what we are doing. Addresses matter, if not IP then on blockchain. There are side channels. There will be bugs. IMHO you have to include the social and legal constraints, too. Even a p2p with ZKP solution has some risk due to the blockchain addresses, which need purpose-limited terms under GDPR too.

On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.


> Tech alone is never enough for anything like what we are doing.

You'd be surprised how far you can get. For example, protocol design exist that provide strong message anonymity: mixnets, DC-nets, and secure multiparty computation (MPC). Tor is great at its goals, but it accepts weaker security for low latency that Brave doesn't need. Tor is also unfortunately blocked in many companies and countries by technology and/or policy. Mixnets are freely available [0]. MPC is sold commercially [1]. (I have no personal or professional connection to either project.)

> On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.

Looking forward to it.

Many thanks for the serious engagement. I look forward to recommending Brave to my friends and colleagues in the not-distant future!

[0] <https://katzenpost.mixnetworks.org> [1] <https://www.unboundtech.com/>


Here's a non-tech issue that already causes some idealists to scorn us: most publishers want to be paid in fiat. That means AML/KYC/anti-sanction-list/etc. Pretty unideal but we are not waiting for others to achieve Utopia. We want to help creators get users funding them sooner, and while some take crypto, most (esp. of size) want fiat.


Honestly, I don't know why you need a blockchain in the first place. Just run your own accounting servers, which you already are doing for the Anonize ballots. It is certainly possible to take in money from identified (i.e. non-anonymous) users and pay out money to identified publishers without being able to determine which users are responsible for which payments to the publishers. However, I also don't see any problem with using Ethereum for transferring money.


Oh, that q ("why you need a blockchain") is easy. Fiat only would require us to be an MSB or MTA, heavy licensing lift and no ability to grant users for free from the user growth pool we precreated before the BAT sale. The user growth pool is the number one reason in my view.

Also we like macro-auditability from funding wallet to omnibus settlement, to show we took the %ages we promised.


BTW I'm aware of mixnets, talking to Harry Halpin and others, but also cautious about new tech. Will look at your refs, thx.


Code links:

JS implementation in Muon-based https://github.com/brave/browser-laptop product:

1. https://github.com/brave-intl/bat-client/blob/master/index.j...

2. https://github.com/brave-intl/bat-client/blob/master/index.j...

3. https://github.com/brave-intl/bat-client/blob/master/index.j...

From Marshall Rose: "the first deals with the delay in before asking for ballots (after a contribution) and the second deals with each delay between submitting each ballot."

New "brave-core" (chromium refork to get front end but w/o Google accounts/sync), the new implementation; narration by Serg Zhukovsky:

"""

here we make a first call https://github.com/brave-intl/bat-native-ledger/blob/reconci...

which calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

after the random wait it calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

and in it’s callback it goes all over again if there are still votes to send

if you are interested in the function that does the randomization, it’s here: https://github.com/brave-intl/bat-native-ledger/blob/reconci... """

The C++ is all new code, not yet released -- bug reports welcome! Thanks.


Thanks! The protection of the delay-based ballot-mixing looks somewhat weak.

I see that the delay from one ballot batch to the next is set to a uniformly-random time between 10 and 60 seconds. I also see that the ballot batch size is 10.

Let's assume that you have fixed the problem of the ballots being submitted in the clear and that they are instead send through a proxy on different TCP connections (or through Tor on different circuits). Let's also say that the news reports are correct and Brave has 4 million active users, and moreover that all have enabled payments (this is generous). Furthermore, assume that (1) there is no reason payments would start being submitted on any particular day or time, and (2) say, 20 publishers are paid 20 tokens each on average. Then the average number of users uploading ballots in any given minute - let's call them "neighbors" - is roughly (20 ballots/user / 10 ballots/batch)(20 publishers/month)(4e6 users)/(30 days/month)/(24 hours/day)/(60 minute/hour) = 3704 neighbors. That's not a very big anonymity set, but it's not nothing. Early adopters got screwed, though.

However, it gets worse. Based on BatClient::prepareVoteBatch() (bat_client.cc), it looks like each batch has ballots for a single publisher (if I'm wrong, privacy erodes even further). How many of those 3704 simultaneous uses are uploading to the exact same domain? I don't know how to guess (to be conservative, we should assume none). Moreover, unless the last batch happens to be a multiple of 10, that one will be unique in being a size less than 10. Both of these things make it easy in many cases to determine when you are finished uploading ballots for one publisher and are moving onto the next. If many of your neighbors have split a publisher's votes across multiple batches, then it seems unlikely that they will move to the next publisher at the same time, making your ballot submissions more linkable across publishers, exactly the problem we were worried about.

In addition, the time of each batch isn't independent because the delay is applied after the last one. For example, if one batch appears is sent quickly by chance, then the next one is more likely to be sent at a time close to the initial batch. Thus, to link two batches, you only need to consider the batches that end in the 10-60 seconds preceding the start of the second one. How many of your 3704 neighbors ended a batch then? The longer a batch upload takes, (due to latency & bandwidth), the less likely that the batch of any other user has ended in that short time frame.

There is also the issue of semantic linking. If I'm a relatively rare group, the sites I visit are likely to be linkable to each other as distinct from those of my neighbors. For example, suppose I speak Catalán and am involved in regional politics, or that I am a teenage boy in Iceland. How many of my sites are linkable based on those characteristics? And this linking can happen across "gaps" in the ballot-upload stream, where there is uncertainly about how the uploads are linked together, allowing you to get the inference "back on track".

Fortunately, the timing stuff seems fixable. Hiding the IP and separating ballots across connections is an essential first step, of course. but to handle timing issues after that, simply do the following: (1) submit all ballots individually; (2) schedule each ballot upload independently of the other ballots and at a uniformly-random time over, say, the next week or two, and if the user isn't online, reschedule.

The semantic issues with small populations seems harder. I think Brave should prevent itself from learning about votes for publishers unless enough people vote for the publisher. I don't see how to do this without a more powerful cryptographic protocol (it is absolutely doable, though, using MPC for example).


Thanks. Marshall Rose said in reply "That is a very good analysis about the traffic pattern. You are right about the batching of ballots for a similar publisher. That is an optimization to reduce the total voting period."

We will work on improving the system. Please mail me first at brave dot com if you want to correspond further. Thanks again.


By the way, please do let me know if I'm wrong and Brave does provide good privacy while enabling payments. I have turned off Brave Payments because of the privacy issue, but I would like to be able to re-enable them. Also, if my understanding is incorrect and there are written descriptions of Brave's technical design, I would love to read them. I have read and understood the Anonize IEEE S&P paper.


After reading my latest reply above, wdyt?


The problem seems well-advertised to me. From the Tor FAQ (https://www.torproject.org/docs/faq.html.en#AttacksOnOnionRo...): "it is possible for an observer who can view both you and either the destination website or your Tor exit node to correlate timings of your traffic as it enters the Tor network and also as it exits. Tor does not defend against such a threat model."

I think that Tor does already implement some padding protections, specifically against correlation via NetFlow records. See the spec at <https://gitweb.torproject.org/torspec.git/tree/padding-spec.... and an implentation history at <https://trac.torproject.org/projects/tor/ticket/16861>.

The effectiveness of further padding protections aren't clear unless you go to full padding, which is very expensive (probably impossible for mobile clients, for example). Tor is successful, in my opinion, because it understands that reducing performance reduces users and thus actually harms anonymity.


This is a pretty uncharitable comment. A nicer (and, in my opinion, more correct) take would be that their articles are written for people that enjoy reading and language.

And the sentences you cite make perfect sense to me: the legislature may be relatively functional and courteous, but it still has many irrational members who are ignorant and backwards.


My understanding is that Facebook runs an onion service (aka hidden service) primarily because it allows them to easily manage their anonymous users separately from other users. "Management" might include separate security logic to identify fraudulent login attempts and avoiding the accidental blockage that occurs sometimes via automated blacklisting of Tor exits. They also get the benefits of a secure name lookup (unlike DNS), and, as you mention, end-to-end encryption that doesn't rely on the Certificate Authority system.

Other "notable" onion services include OnionShare [0], which sets up an onion service to enable simple anonymous file sharing, Ricochet, which is a P2P anonymous chat service that sets up an onion services for each chat participant, and SciHub [2], which provides most academic papers for free. Each of these has been widely reported in the mainstream press.

[0] https://onionshare.org/ [1] https://ricochet.im/ [2] https://scihub22266oqcxt.onion


>My understanding is that Facebook runs an onion service (aka hidden service) primarily because it allows them to easily manage their anonymous users separately from other users. "Management" might include separate security logic to identify fraudulent login attempts and avoiding the accidental blockage that occurs sometimes via automated blacklisting of Tor exits.

I'd be shocked if they didn't have Tor exit tracking already, literally everyone else in the space does.

>They also get the benefits of a secure name lookup (unlike DNS), and, as you mention, end-to-end encryption that doesn't rely on the Certificate Authority system.

The security of the name lookup relies on the crypto, but even without secure name lookups an attacker would still have to break TLS to defeat HSTS.

>Other "notable" onion services include OnionShare [0], which sets up an onion service to enable simple anonymous file sharing, Ricochet, which is a P2P anonymous chat service that sets up an onion services for each chat participant, and SciHub [2], which provides most academic papers for free. Each of these has been widely reported in the mainstream press.

Onionshare and Ricochet aren't widely used, scihub is still accessible over the clearnet.


I'm not sure what you're arguing any more. Your argument started as that only Silk Road was a "notable" onion service, which you appeared to define as having "publicity". Then the argument became the Facebook doesn't really need to run an onion service. Now the argument seems to that there may be some reasonable alternatives to running an onion service for some notable use cases and that few people use the other notable onion services (and I don't see how you can be so sure of that - I and many people I know use them not infrequently).

But I think your original point has been effectively rebutted: there are several notable onion services other than Silk Road, and some of these are quite beneficial.


>Your argument started as that only Silk Road was a "notable" onion service

I never made such an argument, I said the dark net markets are as they're really the only sites receiving large amounts of .onion traffic. (Besides of course botnets)

> which you appeared to define as having "publicity".

We're talking about onionland in the media here, publicity seems like it would be one of the metrics that a journo would use when selecting notable examples of onion sites.

>Then the argument became the Facebook doesn't really need to run an onion service.

This seems to be a case of selective reading. I specifically stated,

>Facebook has no need to hide their origin servers, so their use of .onions is symbolic at best (besides as a TLS alternative) as any tor users would be better off browsing the clearnet version of the site.

I've highlighted the relevant part for you.

Lets say someone even manages to find the facebook onion address, which isn't a particularly easy task since seemingly the only part of their site where it's listed is the blog post mentioning it. For example https://www.facebook.com/help/ is of no use.

Now, lets say someone that's already using facebook over tor finds this address. Do you think they'll switch to it over facebook.com? I didn't, and I seriously doubt very many others did either. All it does is massively increase load times, modern browsers will already have FB certs pinned.

>But I think your original point has been effectively rebutted: there are several notable onion services other than Silk Road, and some of these are quite beneficial.

I'll agree on the other notable onion services, for example AlphaBay is far bigger and better than SR ever was.


> The security of the name lookup relies on the crypto, but even without secure name lookups an attacker would still have to break TLS to defeat HSTS.

Which nobody enables for most websites because it's insane to pin your certificate if you're not Google.

> Onionshare and Ricochet aren't widely used, scihub is still accessible over the clearnet.

"clearnet" doesn't mean anything. Just because you can access it using DNS doesn't mean that the fact it has an onion address is irrelevant. Onion addresses provide several security benefits, and only one of them is "anonymity of the server". As for "not widely used", you appear to have redefined "only notable hidden services". Notable means "important" or "significant". I consider Ricochet to be quite significant.


>Which nobody enables for most websites because it's insane to pin your certificate if you're not Google.

Why?

>"clearnet" doesn't mean anything. Just because you can access it using DNS doesn't mean that the fact it has an onion address is irrelevant.

I think it kind of does when you can just type in "facebook.com" instead of "facebookcorewwwi.onion" and receive a significantly faster browsing experience while not missing out on anything. That's what most users will do. Not only that, the onion is hardly documented (the only mention I could quickly find on facebook.com was in a blogpost!)

> Onion addresses provide several security benefits, and only one of them is "anonymity of the server".

I am well aware, none of which are worth the extra 3 hops.

>As for "not widely used", you appear to have redefined "only notable hidden services". Notable means "important" or "significant". I consider Ricochet to be quite significant.

Ricochet is experimental, unreviewed and nobody should really be using it for sensitive communications at this time.

And why is ricochet particularly significant? It's just glorified torchat, not bitcoin.


> I don't really buy the comparison that what CERT did is similar to a university-sponsored DDoS. I think a better parallel is the Dan Egerstad case.

Here's why it's worse: they inserted a plaintext encoding into the response from the onion-address lookup relay, and so anybody observing the user (e.g. the ISP) could detect what onion address the user was connecting to. This applies after the fact to recorded traffic as well. Thus the researchers had no control over who got deanonymized, to whom they were deanonymized, and when they were deanonymized.

> I do wish both sides would acknowledge this is a tricky issue. On the one hand, if I run a tor exit node or relay, it is my node and it seems like I'm allowed to do with it as I please.

You actually are not allowed to do with your relay as you please. At least in the US, the legal theory protecting relay operators (i.e. safe harbor) also makes it illegal to observe user traffic content except in certain cases (e.g. to improve network performance).

> One other thing to keep in mind here is that SEI is a DoD funded center.

This doesn't seem very relevant. All researchers have an obligation to consider and mitigate possible harms that occur during their research (source: I work in a military research laboratory). These researchers clearly did not fulfill that obligation, and I'm sure their institution is reviewing or has reviewed their procedures to make sure it doesn't happen again.


Let me try to understand your position a little better.

Are you saying the problem here is simply that the effects of the attack were observable by others? If this were not the case, you'd have been fine with it?

And since you seem to be arguing that researchers shouldn't examine user traffic, do you also think that what Egerstad did was also wrong? Do you agree with his arrest?

And one more thing sort of related to this. What's your opinion on research like Arvind's Netflix deanonymization attack? Do you think the work that research involved was also unethical?

> All researchers have an obligation to consider and mitigate possible harms that occur during their research

This is nice idealism and I'm totally in support of it. But I can't help think this is pie-in-the-sky thinking, especially when organizations like the DoD are involved.


If by "darknet" you mean Tor hidden services, then exit relays are not used. The circuits are client->guard->middle->middle->middle->middle->guard->hidden service. The bandwidth bottleneck for hidden services is probably guards, because all relays can be used as middles. Because relays with the Exit flag are used exclusively for exiting (due to the position weights [0], e.g. Wgd/Wed/Wmd), an estimate of the guard bandwidth is the weight of those with only the Guard flag, or ~40Gbps.

[0] https://gitweb.torproject.org/torspec.git/plain/dir-spec.txt


There is no magic bullet here. Here are the things you were probably thinking of and why they won't work:

1. Allow relays to apply individual hidden service (HS) blacklists: HS addresses are not necessarily public, can require authentication to connect to, and are trivial to generate (these are all extremely important properties for anonymous publishing in general). So these CP sites will go even more "dark" once the relay blacklists start being an annoyance. Not to mention that relay blacklists open up an obvious DoS opportunity.

2. Require credentials for HSes and revoke them if they are discovered to be serving CP: There is no apparent way to make identity creation costly in an anonymous world where we must be able to support relatively poor users (e.g. without much CPU, memory, bandwidth, money).

3. Allow authorities to selectively deanonymize certain users or service: There is no way this is going to work in a world where nobody agrees on who the authorities are or what constitutes a legitimate request.

The Tor Project is doing one thing about this problem that is consistent with their mission. They are making accessible safe but useful information about the world of hidden services. In fact, they have a whole funded project on it <https://trac.torproject.org/projects/tor/wiki/org/sponsors/S.... Note that this project includes such useful things as improved crawling support, global HS statistics, and discovering public .onion addresses.


to be honest I wasn't even thinking as specifically as these suggestions - not that any clear solutions occur to me either. but they should, at the very least, recognize that there is a problem that needs to be addressed. I'd like to think there's a less fatalist & more morally empowered approach available besides "forget it, jake, it's anonymous". side note, it's good to see someone here considering the needs of poorer users


I think it's cool that you're making usable security software.

I do worry that "usable" has gotten more thought than "security", and providing a system that doesn't deliver the security it promises could be worse than not having the software at all. It may end up conveniently serving up those at most risk to their adversaries.

As others have noted, anonymity is hard to get right, and the approach here has some serious flaws:

1. It seems that the pseudonymous author of posts can easily be determined by connecting a bunch of Sybils (i.e. multiple clients) to as many other peers as possible and observing who is the first to send new posts by the target pseudonym. And you really can't have a forum without pseudonyms. Users will create them on their own (by including a nickname in their posts) even if you don't build it in.

2. There is an easy so-called "intersection attack" in which the sets of users that are connected at any given time a pseudonymous entity posts are intersected. The actual author will always be present, and the other participants won't be static, and so eventually only the author will remain in the intersection.

3. There is no apparent protocol obfuscation. Despite the use of TLS, the protocol traffic patterns of this new protocol are likely to be highly identifying. They can then be easily confirmed by an active attacker directly connecting to the suspected participant. In addition, it doesn't seem that the list of participants is protected, and so an adversary can just connect to the network to discover who to block or punish. Tor will not solve the problem here if users have to be able to receive incoming connections. And if you're using Tor, then you are relying on an external system that has censorship issues of its own (e.g. access from China is currently extremely limited) and does rely on servers.

4. The bootstrap IPs can obviously be easily blocked.

5. The votes are not anonymous, which is unlikely to be clear to users and which are nearly as sensitive as authorship itself.

6. Denial-of-service here is as simple as flooding the network with "forwarded" posts and votes.

Here are some suggestions for designing a system that is secure and that people can trust as being secure:

1. Write a white paper describing the design! This is not a detailed protocol spec - it's a description of how the protocol works at a higher level along with arguments establishing its security properties. This allows others to understand and critique the design.

2. Check out some of the related system designs [0-6]. They have had to deal with the same issues, and you can learn from them. You can get all these papers and more at <http://freehaven.net/anonbib/>. As you can see at that site, people have been thinking about these issues for a while and have figured out a lot!

3. Submit your white paper to a computer security conference. Even if it doesn't get in, you will get feedback from experts.

As it is currently, I wouldn't trust my communication to this system. You really need a large and diverse user base to provide anonymity, and so you will have to work at convincing people that this is something they can trust. Good luck!

[0] "Membership-concealing overlay networks" by Vasserman et al. CCS09

[1] "Crowds: anonymity for Web transactions" by Reiter and Rubin. TISSEC 1998.

[2] "Freenet: A Distributed Anonymous Information Storage and Retrieval System" by Clarke et al. PET 2000.

[3] "Traffic Analysis: Protocols, Attacks, Design Issues and Open Problems" by Jean-François Raymond. PET 2000.

[4] "ScrambleSuit: A Polymorphic Network Protocol to Circumvent Censorship" by Winter et al. WPES 2013.

[5] "Drac: An Architecture for Anonymous Low-Volume Communications" by Danezis et al. PETS 2010.

[6] "Tor: The Second-Generation Onion Router" by Dingledine et al. USENIX Security 2004.


Now, this is the comment I came to HN for. Thanks for this, I have a few readings to do.

For your points, While I don't have a full–scale refutation, here's a few addendums in order. All of this is wrapped in a giant 'If I understand you correctly'.

> And you really can't have a forum without pseudonyms. Users will create them on their own (by including a nickname in their posts) even if you don't build it in.

That's human self–incrimination. As long as this is safe for an one–time user that opens the app on an internet cafe, posts something and goes away, I have some basic semblance of security I can build upon. That does not mean it is secure, it just means it's secure for something—and that's a start. (It might not be actually secure for even that, let me know if you know it not to be so)

> There is an easy so-called "intersection attack" in which the sets of users that are connected at any given time a pseudonymous entity posts are intersected. The actual author will always be present, and the other participants won't be static, and so eventually only the author will remain in the intersection.

The actual author won't always be present. The posts start at a point, but they do not need the author to be present to continue distribution. When Alice posts something and Bob gets the post, from then on Alice can disappear forever. If a post is below a threshold of availability on nodes Bob is connected to, Bob will flag it as neutral post (to make that distribution not count as an upvote) and start distributing it on his own to prevent post extinction. That said, this doesn't prevent intersection attacks, it just makes them less viable.

> Tor will not solve the problem here if users have to be able to receive incoming connections.

The users do not need to accept incoming connections. There are some very restrictive routers that refuse to be UPNP port mapped, and Aether works fine on them.

> and so an adversary can just connect to the network to discover who to block or punish.

For this, the roadmap is to have a 'protected' node which refuses all connections from nodes except those who are explicitly marked as trusted.

> The bootstrap IPs can obviously be easily blocked.

It does not rely on the bootstrap IP. If you have installed the application, it asked you in the onboarding process IP and port of a friend that you know to be online. If you give it that, it'll use it. In fact, I'm planning to turn the bootstrap node off or just make it a redirect to some other random node in the future.

> The votes are not anonymous, which is unlikely to be clear to users and which are nearly as sensitive as authorship itself.

They point to node id's, which are not users, but machines. This is an inherent tradeoff, in that I have to have some data to gauge the popularity of a post. As far as I know, there is no way out of this without implicit trust in a third party.

> Denial-of-service here is as simple as flooding the network with "forwarded" posts and votes.

Well, those posts won't get upvoted, and will get stuck in spam filters and upvote thresholds of users. None of those are implemented yet, of course, but this doesn't seem to be a structural problem.

> As it is currently, I wouldn't trust my communication to this system.

Please, for the love of god, don't trust Aether (yet). This is barely alpha level code.

For the rest, thank you. Much appreciated. I'll be reading.


I love reading threads like this. It is all to rare to see people engaging with a spirit of humility and learning. Can't wait to see how this project progresses. Good luck dude!


> > And you really can't have a forum without pseudonyms. Users will create them on their own (by including a nickname in their posts) even if you don't build it in.

> That's human self–incrimination. As long as this is safe for an one–time user... I have some basic semblance of security

So this is not anonymous reddit, then. That is much less useful, and it had better be extremely clear to users that they should only use it in that way.

> > There is an easy so-called "intersection attack"... The actual author will always be present,... and so eventually only the author will remain in the intersection.

> The actual author won't always be present. The posts start at a point, but they do not need the author to be present to continue distribution.

In this attack, the adversary would need to be one of Alice's peers most of the time. If he isn't, though, because Alice only connects to a few peers consistently, then he can at least identify one of those consistent peers. That serves as a focus for attack, say by denial of service.

> > Tor will not solve the problem here if users have to be able to receive incoming connections.

> The users do not need to accept incoming connections. There are some very restrictive routers that refuse to be UPNP port mapped, and Aether works fine on them.

So to actually be undetectable as using Aether, you can't accept connections. Then you have to hope that enough users are connecting for the anonymity and not the undetectability, or you'll have to provide some infrastructure nodes.

> > and so an adversary can just connect to the network to discover who to block or punish.

> For this, the roadmap is to have a 'protected' node which refuses all connections from nodes except those who are explicitly marked as trusted.

Great, if you promise undetectability, then this should be the default. Of course, that makes connectivity a challenge (what if everybody you trust doesn't accept connections because they also want to remain undetectable?).

> > The bootstrap IPs can obviously be easily blocked.

> It does not rely on the bootstrap IP. If you have installed the application, it asked you in the onboarding process IP and port of a friend

Sounds good!

> > The votes are not anonymous, which is unlikely to be clear to users and which are nearly as sensitive as authorship itself.

> They point to node id's, which are not users, but machines.

I don't understand the distinction being made here. In any case, the upvote is observed as coming directly from some IP. That is the identifier to worry about. As far as privately gauging the popularity of the post, I don't exactly understand what you need here, but they may be some crypto solutions that could work. Unfortunately, post popularity seems easily spoofed to me.

> > Denial-of-service here is as simple as flooding the network with "forwarded" posts and votes.

> Well, those posts won't get upvoted, and will get stuck in spam filters and upvote thresholds of users. None of those are implemented yet, of course, but this doesn't seem to be a structural problem.

What about the mechanism to prevent extinction of a post? Doesn't that spread a post without upvotes? And why can't I create a network of Sybils to upvote my spam posts? Also, spam filters are a UI mechanism, if I understand what you mean. I am talking about consuming network and memory via protocol flooding.


> So this is not anonymous reddit, then. That is much less useful, and it had better be extremely clear to users that they should only use it in that way.

Depends on how you emphasize that sentence. It's reddit, but its anonymity is weaker on certain fronts and stronger on others. If used as one-shot device, it's pretty good. Otherwise, there are the issues you mentioned (which I plan to fix, to my best).

> So to actually be undetectable as using Aether, you can't accept connections. Then you have to hope that enough users are connecting for the anonymity and not the undetectability, or you'll have to provide some infrastructure nodes.

Correct.

> Great, if you promise undetectability, then this should be the default.

I do not promise undetectability, but it exists under certain circumstances. I will explicitly note those circumstances and mark undetectability as a side benefit only under those conditions.

> I don't understand the distinction being made here. In any case, the upvote is observed as coming directly from some IP. That is the identifier to worry about.

The distinction is largely academic as you said. If you have a cryptographic solution to that, I'd love if you could point me to the right direction.

> And why can't I create a network of Sybils to upvote my spam posts?

You can, but users can also block your nodes, or (we're really going into the medium-term future here) your nodes would be placed in blocklists, whose users—people who accepted them— would deny you from connecting to them. (This is a half–baked idea as of now, who maintains those lists etc.) This is a thorny problem. Spam filters, I was meaning less of an actual after-the-fact spam filter, and more of a "block this guy out, refuse connections" kind of filter. Sorry for the wrong choice of words.

All in all, very fair points I need to work on. If you would be interested in taking a look once in a while to point out where the logic holes are, I'd really appreciate your voice in development. If you'd be interested in helping out, send a mail to me (burak@nehbit.net)— I would try to run more important things by you before implementing to see if there are any obvious holes.


Two problems with unscheduled communications: 1. It allows the adversary to disrupt communications by continuously sending junk. Solving this problem was a major goal of Dissent not adequately handled by previous designs based on Dining Cryptographers networks. 2. Without a schedule telling everybody when to send something, the first guy to talk is obviously the sender, destroying anonymity.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: