Please read up on GDPR "purpose limitation". We cannot use IP address except for...

BrendanEich · on Oct 6, 2018

Separate thing, not promising it on a schedule yet so really FWIW: our BAT roadmap's "Apollo" phase aspires to decentralize as much as possible. This could certainly include p2p flows with ZKPs in state channels or better. We are looking at OMG's Plasma implementation.

So the ultimate goal is to get away from ANONIZEd traffic to a blind accounting server. But as I say, lots of problems to solve before promising this. Yet with Ethereum scaling and anonymity support, for users who buy their own BAT (where I claim your objection to IP address has most merit), we could go p2p on-chain for decentralization w/o fraud risk for bring-your-own-BAT users.

ohmygodel · on Oct 6, 2018

Interesting, but decentralization does not equal privacy. Indeed, it might make privacy worse by sharing the data more widely and making it even easier to get copies of the data. Consider, for example, BitTorrent, which has a pretty decentralized distribution protocol that also makes it easier for third parties like the MPAA to observe who is sending and receiving the files.

Even using a privacy-enhanced blockchain isn't necessarily sufficient. Blockchains do not provide anonymous messaging. Therefore, a recipient R of a transaction can identify the sender S if R can observe S sending the transaction. Yes, this problem affects Bolt, Zcash, Monero, etc.

BrendanEich · on Oct 6, 2018

Yeah, I noted blockchain issues in my latest (in time) reply. But at least you won't have IP addresses to worry about any longer :-P.

18pfsmt · on Oct 7, 2018

What if one were to run one of these "privacy-enhanced blockchain"s from a VPS (paid for with these same anonymized tokens)?

In case it's not clear, I'm earnestly asking this question

ohmygodel · on Oct 7, 2018

That's basically using a proxy, and so it has the same security. If the proxy is/geos bad (say, your VPS provider reveals your IP to some interested guys with guns), then you lose (anonymity). If your proxy remains good, then all that can be learned is the transactions originated at the proxy. However, the proxy does serve as a potential pseudonym, and so if the collection of transactions reveals something identifying, even if any one transaction doesn't, then you lose.

ohmygodel · on Oct 6, 2018

> Please read up on GDPR "purpose limitation".

I am reasonably familiar with the contents of GDPR, having looked into it more after attending a lecture on the subject [0].

> We cannot use IP address except for antifraud, so it is not legally viable for us to try to link zero-knowledge proofs into a profile based on IP address.

If your users must rely on you obeying a policy, then please just say that. Right now, it seems to me that you claim to use technical means to prevent Brave from learning browsing histories [1].

> my home AT&T IP address wanders often, so do many others; mobile even more variable.

IP addresses can be so identifying that they have been ruled as personally-identifiable information by the European Court of Justice [2].

> I think you are mistrusting prematurely. But as noted in my item 1, we are talking to PIA about using an IP relay (not full VPN). This got delayed by their work on handshake.org but we're restarting it.

Thank you for stating clearly that you aren't using PIA (aka "IP masking") at the moment for Brave Payments. You might consider your users who are worried about data breaches and compromised servers as much as they are worried about Brave's intentions. Please don't take my criticisms personally.

> Putting these through separate Tor circuits is possible, as we also randomly space them out in time.

Oh, you do randomly delay ballot submissions? I have not been able to find any such logic in the code but would be happy to be pointed to it. The specific way in which you choose delays is, of course, crucial to it providing security.

[0] <https://petsymposium.org/2018/program.php>

[1] <https://brave.com/faq-payments/#anonymous-contributions>

[2] <https://www.irishtimes.com/business/technology/european-cour....

BrendanEich · on Oct 6, 2018

> If your users must rely on you obeying a policy, then please just say that.

I did just say that, several times -- but with conditions that do not make it a matter of "policy" only. We agree IP address should be masked for self-funded users. Working on it!

We are very familiar with how any user log can be used as a history, but anonized proofs that can't link to a user id except illegally to IP address are not on the same level of threat as the Blendle, Flattrplus, etc. histories taken in the clear -- never mind Google et al. surveillance. To equate the tech and not make any distinctions does a disservice to us in my view. I'm not sure you did equate, but see next paragraph.

Tech alone is never enough for anything like what we are doing. Addresses matter, if not IP then on blockchain. There are side channels. There will be bugs. IMHO you have to include the social and legal constraints, too. Even a p2p with ZKP solution has some risk due to the blockchain addresses, which need purpose-limited terms under GDPR too.

On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.

ohmygodel · on Oct 6, 2018

> Tech alone is never enough for anything like what we are doing.

You'd be surprised how far you can get. For example, protocol design exist that provide strong message anonymity: mixnets, DC-nets, and secure multiparty computation (MPC). Tor is great at its goals, but it accepts weaker security for low latency that Brave doesn't need. Tor is also unfortunately blocked in many companies and countries by technology and/or policy. Mixnets are freely available [0]. MPC is sold commercially [1]. (I have no personal or professional connection to either project.)

> On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.

Looking forward to it.

Many thanks for the serious engagement. I look forward to recommending Brave to my friends and colleagues in the not-distant future!

[0] <https://katzenpost.mixnetworks.org> [1] <https://www.unboundtech.com/>

BrendanEich · on Oct 6, 2018

Here's a non-tech issue that already causes some idealists to scorn us: most publishers want to be paid in fiat. That means AML/KYC/anti-sanction-list/etc. Pretty unideal but we are not waiting for others to achieve Utopia. We want to help creators get users funding them sooner, and while some take crypto, most (esp. of size) want fiat.

ohmygodel · on Oct 6, 2018

Honestly, I don't know why you need a blockchain in the first place. Just run your own accounting servers, which you already are doing for the Anonize ballots. It is certainly possible to take in money from identified (i.e. non-anonymous) users and pay out money to identified publishers without being able to determine which users are responsible for which payments to the publishers. However, I also don't see any problem with using Ethereum for transferring money.

BrendanEich · on Oct 6, 2018

Oh, that q ("why you need a blockchain") is easy. Fiat only would require us to be an MSB or MTA, heavy licensing lift and no ability to grant users for free from the user growth pool we precreated before the BAT sale. The user growth pool is the number one reason in my view.

Also we like macro-auditability from funding wallet to omnibus settlement, to show we took the %ages we promised.

BrendanEich · on Oct 6, 2018

BTW I'm aware of mixnets, talking to Harry Halpin and others, but also cautious about new tech. Will look at your refs, thx.

BrendanEich · on Oct 6, 2018

Code links:

JS implementation in Muon-based https://github.com/brave/browser-laptop product:

1. https://github.com/brave-intl/bat-client/blob/master/index.j...

2. https://github.com/brave-intl/bat-client/blob/master/index.j...

3. https://github.com/brave-intl/bat-client/blob/master/index.j...

From Marshall Rose: "the first deals with the delay in before asking for ballots (after a contribution) and the second deals with each delay between submitting each ballot."

New "brave-core" (chromium refork to get front end but w/o Google accounts/sync), the new implementation; narration by Serg Zhukovsky:

"""

here we make a first call https://github.com/brave-intl/bat-native-ledger/blob/reconci...

which calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

after the random wait it calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

and in it’s callback it goes all over again if there are still votes to send

if you are interested in the function that does the randomization, it’s here: https://github.com/brave-intl/bat-native-ledger/blob/reconci... """

The C++ is all new code, not yet released -- bug reports welcome! Thanks.

ohmygodel · on Oct 7, 2018

Thanks! The protection of the delay-based ballot-mixing looks somewhat weak.

I see that the delay from one ballot batch to the next is set to a uniformly-random time between 10 and 60 seconds. I also see that the ballot batch size is 10.

Let's assume that you have fixed the problem of the ballots being submitted in the clear and that they are instead send through a proxy on different TCP connections (or through Tor on different circuits). Let's also say that the news reports are correct and Brave has 4 million active users, and moreover that all have enabled payments (this is generous). Furthermore, assume that (1) there is no reason payments would start being submitted on any particular day or time, and (2) say, 20 publishers are paid 20 tokens each on average. Then the average number of users uploading ballots in any given minute - let's call them "neighbors" - is roughly (20 ballots/user / 10 ballots/batch)(20 publishers/month)(4e6 users)/(30 days/month)/(24 hours/day)/(60 minute/hour) = 3704 neighbors. That's not a very big anonymity set, but it's not nothing. Early adopters got screwed, though.

However, it gets worse. Based on BatClient::prepareVoteBatch() (bat_client.cc), it looks like each batch has ballots for a single publisher (if I'm wrong, privacy erodes even further). How many of those 3704 simultaneous uses are uploading to the exact same domain? I don't know how to guess (to be conservative, we should assume none). Moreover, unless the last batch happens to be a multiple of 10, that one will be unique in being a size less than 10. Both of these things make it easy in many cases to determine when you are finished uploading ballots for one publisher and are moving onto the next. If many of your neighbors have split a publisher's votes across multiple batches, then it seems unlikely that they will move to the next publisher at the same time, making your ballot submissions more linkable across publishers, exactly the problem we were worried about.

In addition, the time of each batch isn't independent because the delay is applied after the last one. For example, if one batch appears is sent quickly by chance, then the next one is more likely to be sent at a time close to the initial batch. Thus, to link two batches, you only need to consider the batches that end in the 10-60 seconds preceding the start of the second one. How many of your 3704 neighbors ended a batch then? The longer a batch upload takes, (due to latency & bandwidth), the less likely that the batch of any other user has ended in that short time frame.

There is also the issue of semantic linking. If I'm a relatively rare group, the sites I visit are likely to be linkable to each other as distinct from those of my neighbors. For example, suppose I speak Catalán and am involved in regional politics, or that I am a teenage boy in Iceland. How many of my sites are linkable based on those characteristics? And this linking can happen across "gaps" in the ballot-upload stream, where there is uncertainly about how the uploads are linked together, allowing you to get the inference "back on track".

Fortunately, the timing stuff seems fixable. Hiding the IP and separating ballots across connections is an essential first step, of course. but to handle timing issues after that, simply do the following: (1) submit all ballots individually; (2) schedule each ballot upload independently of the other ballots and at a uniformly-random time over, say, the next week or two, and if the user isn't online, reschedule.

The semantic issues with small populations seems harder. I think Brave should prevent itself from learning about votes for publishers unless enough people vote for the publisher. I don't see how to do this without a more powerful cryptographic protocol (it is absolutely doable, though, using MPC for example).

BrendanEich · on Oct 7, 2018

Thanks. Marshall Rose said in reply "That is a very good analysis about the traffic pattern. You are right about the batching of ballots for a similar publisher. That is an optimization to reduce the total voting period."

We will work on improving the system. Please mail me first at brave dot com if you want to correspond further. Thanks again.