Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The attack you suggest is ruled out by differential privacy. The precise guarantee is a bit complicated. The first thing to note is that the output of a differentially private mechanism must be random. Then, the guarantee is that Pr[output] does not change by very much whether or not you are included in the dataset. In other words, even if you were omitted from the dataset, there the chance that the algorithm produced the same result is very similar.

This definition rules out the attack you suggest. In particular, if you are removed from the dataset, then the probability of the output (i.e., a ride starts in the region) goes from very large to very small. Therefore, the algorithm you describe (i.e., adding noise to the start location) is not actually differentially private.

The confusion arises because oftentimes adding noise is sufficient. For example, the average of n real numbers in [0,1] is affected by at most 1/(n-1) if you delete one point from the dataset. Therefore, you can just add a little bit of noise and the dataset becomes differentially private.

For the dataset you describe, a sibling comment proposed the correct mechanism -- you have to add noise to the count returned by the query, not the start location. (Technically I think you could just add noise to the start location like you propose, but the amount of noise would have to be large enough that all the start locations overlap by a sufficient amount.)



Thank you! Makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: