Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

whenever i go through my bookmarks, i tend to find maybe 5-10% are now 404.

this is why i like the archive.ph project so much and using it more as a kind of bookmarking service.



What’s the benefit to using archive.ph instead of archive.org (Internet Archive)? Seems like the latter is much more likely to be around for awhile.


i find archive.ph does a better job of preserving the page as is (it also takes a screenshot) compared to internet archive which can be flaky at best.

i also find archive.ph much faster at searching, and the browser extension is really useful too.

the faq does a great job of explaining too https://archive.ph/faq


archive.today does that by rewriting the page to mostly static HTML at the time of capture.

archive.org indexes all URLs first-class and presents as close to what was originally served as possible. It also stores arbitrary binary files and captures JS and Flash interactivity with remarkable fidelity.

When logged in, the archive.org Save Page Now interface gains the options of taking a screenshot and non-recursively saving all linked pages. I cannot reason why—the more saved, the better, right?

archive.org has a browser extension too


Isn't archive.ph/today the one with questionable funding sources and backing? Who is behind it and can it be trusted for longevity?


In this case the less we know, the longer it will last. Notice how this site ignores robots.txt and copyright claims by litigious companies that would like to see their past erased.

The data saved on your NAS will outlast this site regardless of who owns/funds it.


Their explanation for ignoring robots makes sense - they say they ignore it because their crawler only runs when a human enters a URL and archives it, they also link to Google as this is what they do.


How do you figure?


What do you mean? There’s a line of companies waiting to sue anyone involved with that site. That’s been the case for many years.


A site devoted to duplicating content from elsewhere online, and with a significant use-case of defeating paywalls, would be a very likely candidate for lawsuits.

Concealing ownership would tend to help avoid this / minimise consequences.

That might still be a brittle defence.


yeah funding is a grey area...

fwiw the website is only accessible by VPN in a lot of countries, which is say a lot for me..and i don't think they've taken down any content, although i cant say for sure.


So ... what is known about the operator(s) / funding?


Likely Slavic⸺

• WHOIS points to a "Denis Petrov" in Prague

• Share menu has buttons for Reddit, VKontakte, Twitter, Pinboard, and Livejournal. Eyebrow raising. VK is Russian, and so is LJ nowadays. Pinboard (notable as successor of del.icio.us) is American, coincidentally founded by a Polish immigrant.

• With a sizable dose of confirmation bias, the mistakes in the English of the site and blog do feel appropriately Slavic

It's stated to be privately funded with costs around US$4000/month, began accepting donations in 2016 (https://wiki.archiveteam.org/index.php/Archive.today#Vital_S...)


Thanks. That's largely the sense I've had.

Motivation / utility is a question that's occurred to me more than once.


archive.ph = Russian federation website. Blocked by most firewalls by default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: