Now I go looking for a job and someone in HR decides that
my political views are too risky, so I don't even get an
interview.
It's not ideal, but basing hiring decisions on real information (even if 'private') strikes me as better than using uninformed prejudice. In the example you give, is it better or worse to have the HR person say "Homoiconic? Sounds gay to me!"? I don't see much increased risk in offering a deeper pool of inherently unverifiable information, even if this information might be used badly.
The author's views are that Netflix can provide the
benefits you desire without compromising the privacy I
desire. I think the debate should be around whether the
authors are correct.
This is a good debate, but for me it's a secondary one. Even if a privacy-safe solution is theoretically possible, I fear the only likely outcome will be that Netflix buries a 'release authorization' somewhere deep within their terms of service and is dissuaded from offering similar data sets in the future.
I also think that for my needs 'anonymize and release' is the only feasible solution. While online systems might be better than nothing, the approaches I'm interested in (kNN on GPU's, massively parallel cross-validations) really require local data and full control over data layout. But perhaps there is indeed a solution that works for everyone.
I'm going to try to paraphrase your concerns that the privacy fears are overblown and we're just losing something valuable, I hope you don't mind, please tell me if I got it wrong:
>> It's not ideal, but basing hiring decisions on real information (even if 'private') strikes me as better than using uninformed prejudice. In the example you give, is it better or worse to have the HR person say "Homoiconic? Sounds gay to me!"? I don't see much increased risk in offering a deeper pool of inherently unverifiable information, even if this information might be used badly.
Paraphrased: "They have poor information sources for their prejudice now, I don't see what harm better information sources driving their prejudice could do."
>> This is a good debate, but for me it's a secondary one. Even if a privacy-safe solution is theoretically possible, I fear the only likely outcome will be that Netflix buries a 'release authorization' somewhere deep within their terms of service and is dissuaded from offering similar data sets in the future.
Paraphrased: "The debate on privacy dangers is secondary, if Netflix refuses to release useful data in the future then that's the primary danger"
>> I also think that for my needs 'anonymize and release' is the only feasible solution. While online systems might be better than nothing, the approaches I'm interested in (kNN on GPU's, massively parallel cross-validations) really require local data and full control over data layout. But perhaps there is indeed a solution that works for everyone.
Paraphrased: "I was working on a project with the data, and the FTC ruling appears to have put a stop to that."
They have poor information sources for their prejudice
now
Yes to that part. We should base our choices on our current situation, not an idealized view.
I don't see what harm better information sources
driving their prejudice could do.
That loses too much nuance. I can see that it could do harm, but think that increased information could also reduce prejudice. In the particular case of movie reviews, I see the added risk as low, and the potential benefit as small but significant.
The debate on privacy dangers is secondary, if Netflix
refuses to release useful data in the future then that's
the primary danger
No, I mean secondary in the sense that we need to sort out the what the actual privacy dangers are before we try to come up with workarounds. If the outcome is that Netflix embeds some small print requiring all users to allow the release of 'suitably anonymized' rating data, has this changed anything? We still need to determine the actual risks. Perhaps the privacy advocates so right about the dangers that even an opt-in system should be prevented.
I was working on a project with the data, and the FTC
ruling appears to have put a stop to that.
People already using the data set internally will likely be unimpacted. The Netflix terms of release presumably still apply, and prevent most commercial usages. But research papers that need to cite a publicly available data set will be adversely impacted, and the legal 'cloud of fear' will likely prevent the release of future data sets.
Thanks for responding. It helps, I was so tempted to argue with your points as I had misread them.
I feel strongly about the primacy of privacy because the problems of prejudice are so severe. But the transparent world approach has a lot of depth to it, and I'm nervous and excited to see where it leads us, since like it or not it the technical and social changes we're seeing are creating so much data.
Here are the issues I think we have with this particular data release. I understand they may be resolvable though, and while I argue against accepting privacy problems like this right now, I would definitely love to find out that we can shift to a more transparency society without reservation and without the level of abuse that I fear:
-- Information asymmetry, since most people don't typically keep the type of statistical skills, computing power, and even secondary data sets for cross joins and comparisons around. Information asymmetry is a widely recognized concern in social behavior and commercial law (i.e. insider trading & real estate contracts)
-- Ethical concerns seem to constrain us to taking people's own privacy expectations into account. Originally privacy loss didn't seem to be a problem, but the situation changed when evidence emerged to the contrary
-- History shows us that governments frequently overstep their bounds. Rather than argue recent events which are more clouded by emotion and the narrative of the day, I think most people would agree that the FBI in the 60s was quite abusive in their gathering and use of data
That said, transparent world all the way. How do we get there?
"I'm going to try to paraphrase your concerns that the privacy fears are overblown and we're just losing something valuable, I hope you don't mind, please tell me if I got it wrong:"
"I'm going to just rewrite your argument" is not a valid form of debate. Please don't do that, ever.
I'm not so sure if you're correct. I was polite, and I wanted to know.
When you're on the Internet it's hard to know where people are coming from. I can see now how I misread him, I wish I didn't, but I did. How else do I find out politely what the terms of a debate are in a public forum? Is it truly better to not attempt at true comprehension of a person's words? Is it all just throw-away, a comment on an aggregation site?
And this was a tough thing to ask politely, simply because I had a presumption which was wrong. But I tried to do it right. I think the response I got back was pretty clear, the OP knows the fundamentals of the privacy debate, he/she has an opinion on the privacy issue itself, and the OP is not too offended by my best take on a gentle nudge at the bias question, a pretty safe guess from that is that he understands the questions of bias as a safety net for intellectual thought rather than as an offensive gesture.
I didn't know all of that, and so I asked. That's all it takes.
It's hard to communicate well online. It's this new thing for the human mind, without a 100,000 years of conditioning. Every day I log on to Hacker News for two things: It's shockingly educational in many disciplines; and I want to learn to communicate so I can better participate in society in a positive way. What a challenging place to do it. In the process I've found out that my communication needs a lot of work. I've been lurking since '88 and lurking didn't teach me communicating, what gives ;-).
But I'm pretty sure I took a reasonable tack here. Listening comprehension, interest in knowing what people really mean in short form communication, and trying to find out why people say things that we disagree with seem to be critical needs in this complex medium.
I voted him up because I'm taking him at his word that his goal is to paraphrase, and that I wasn't very clear in my initial writing. If both speakers have equal opportunity to respond, saying back in your own words what you think you are hearing is a great way to reach understanding. If nothing else, it made me reread what I wrote to see if I could state things more clearly.
I also think that for my needs 'anonymize and release' is the only feasible solution. While online systems might be better than nothing, the approaches I'm interested in (kNN on GPU's, massively parallel cross-validations) really require local data and full control over data layout. But perhaps there is indeed a solution that works for everyone.