> it seems linking to a copy that claims the dataset is public domain, would be problematic copyright-wise.
Would it? Sounds to me like the blame lies on the person uploading the dataset under that license, unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'
> unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'
Yes there's an expectation that you put in some minimum amount of effort. The license issue here is not subtle, the Kaggle page says they just downloaded the eBooks and converted them to txt. The author is clearly familiar enough with HP to know that it's not old enough to be public domain, and the Kaggle page makes it pretty clear that they didn't get some kind of special permission.
If you want to get more specific on the legal side then copyright infringement does not require that you _knew_ you were infringing on the copyright, it's still infringement either way and you can be made to pay damages. It's entirely on you to verify the license.
I'm not a copyright expert and if you told me that Harry Potter was common domain then I'd probably be a bit surprised but wouldn't think it's crazy. The first book came out 30 years ago after all. On further research the copyright laws are way more aggressive than that (a bit too much if you ask me) but 30 years doesn't seem quick. Patents expire after 20 years.
I find this fascinating, as I keep observing that there are pretty widespread differences between what people believe copyright does and what the law actually says.
The Berne Convention (author's life + 50 years) is the baseline for the copyright laws in most countries. Many countries have a longer copyright period than Berne.
I think even people who don't care about how broken the copyright system is understand intuitively that huge commercial properties that are contemporaneous with themselves are protected. They don't need to know any details to know that these properties belong to massive companies and aren't free for the taking.
How many people think they can rip off Disney characters even if they don't know how much Disney lobbied to extend their ownership? People can observe that no one but Disney gets to use them and understand, even if not consciously, that those are Disney's to use.
^ Probably poorly written without time to proof cause time constraint.
It is a media franchise for children, and there are many elements, and trademarks in addition to copyrights. I think most fans understand the bright line that stops them copying an entire book or film work, unless their dad has a Roku at home.
But there are over 34,000 images uploaded to the Fandom.com site alone. There are character bios and generous quotes from films and books. Countless fans are using elements in memes and avatars and social media posts.
Fan-fiction abounds, where the characters and scenarios are endlessly remixed and mashed up with other fandoms.
Quidditch... simulated... is a collegiate sport, but they had to rename it.
Even on the official Wizarding World site, you can make custom downloadable stuff. Not long ago, freely download wallpapers. Get free clips and trailers on any video site.
News outlets had a difficult time explaining the "Public Domain" status of Mickey Mouse and Betty Boop with the new years. Because Mickey Mouse and Betty Boop, the characters, aren't the things which are copyrighted, and the characters' status didn't change with the new year.
I would bet that the typefaces in the official books have their own copyrights, and the book binding processes are patented.
The article author and the uploader should _BOTH_ be sentient enough to engage brain and not just ignore it because they feel "it's an abstract concept I'd not get in trouble for when not working in the US or EU".
Copyright infringement is a strict liability tort in the US. Willful infringement can result in harsher penalties, but being mistaken about the copyright status is not a valid defense.
I don't know if you're trying to say that, in the realm of tort law, it is only strict liability, or if you are saying that copyright infringement is only a tort. If it's the latter, it's completely untrue, as there are criminal copyright infringement statutes.
Would it? Sounds to me like the blame lies on the person uploading the dataset under that license, unless there is some reasonable person standard applied here like 'everyone knows Harry Potter, and thus they should know it is obviously not CC0'