The thing I really don't understand is this seems like such a simple thing to test! Search for X in our data set vs. competitor data set: flag for review if we are not within 1km radius. Don't release until the number and scope of flags hasn't been reduced to an acceptable margin of error (don't even have to test the full data set, just need a good enough statistician to help you figure out how bad your overall data is based on your sample tested data).
2 problems here, first, can you use competitors data for comparison and making your data better? Second, calculating difference is very difficult and size of difference could be overwhelming. But first issue is the main problem.
2. Difficult how? You don't have to perform it on items that can't be automatically searched for. Then you do subtraction, square, check if you're over a kilometer.
Not if the terms of use of the "someone else" explicitly prohibit such activity. This is particularly common with local information providers (eg business/event listings).