Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every single one of these results carries the `html` filetype as part of their URL is my experience. This is likely a consequence of the useragent-based switcheroo technique they use to fool Google.

Just blanket block the lot with the following uBlock Origin filter:

    google.*##.g:has(a[href*=".it"][href$=".html"])
Google ain't going to fix itself ;)


Blanket banning a whole TLD is stupid. One thing is blocking some obscure stuff like ".su", but .it? It's just too big, and arguably unwise if you are in Europe where having to connect to Italian websites or services isn't a remote possibility.


This merely hides Google search results in my browser.

No network connections are blocked...


Yes, you hid all Italian Google search results - arguably not an ideal solution.


I'm sure there are plenty of non-spam html pages based in Italy too


Considering the crowd that trade-off that seemed too obvious to mention.


cool!

now s/\.it/every TLD/ and you solved domain spam forever.

/s

You might not know that 99.99% of .it domains with urls ending up in .html are completely legit, including some official government one.


Since uBlock is run on the client, unless you’re Italian or interested in Italian sites it doesn’t really seem like much of an issue.

I could block all .it sites on my network and I’d likely never even notice.


yeah, right, unless you're american, why should you care about .com domains?

  ¯\_(ツ)_/¯

the problem is not .it domains, it's clearly stated in the linked post

A large number of spam pages are indexed when searching by our product name. It’s very similar to Japanese Keyword hack, but the difference is that our site is not hacked

so it's definitely an indexing issue, those .it domains are being indexed for the Japanese word hack for some reason, it's not that .it domains are particularly spammy per se.

Your "solution" would filter the vast minority of the abusers at the cost of banning an entire TLD, not much different than turning off the internet connection entirely.

Most of the spam on the internet comes from .com domains though, even more so because registering a .com domain is much easier than getting an .it

Are you willing to ban .com too?


> Your "solution" would filter the vast minority of the abusers at the cost of banning an entire TLD, not much different than turning off the internet connection entirely.

Again, we’re talking about client-side filtering. The original comment about blocking .it domains was talking about a uBlock Origin rule. No one’s talking about blocking .it domains from the web.

Yes, as an American, I could block all .it domains on my end and my web experience likely wouldn’t change at all. I rarely, if ever, need to visit .it domains. So maybe I will.


This visually hides the HTML elements on Google Search and for me only. There is no networking involved and so Italian TLDs are still reachable.

This is a personal solution to an extremely disruptive and long standing problem, and only affects those who choose to employ it. It's not hurting anyone.


.com implies spam - it's commercial, so let's go ahead. If it's not .org I'm not playing. /s


And yet, here you are, and not on ycombinator.org? ;-)


Nah. I've been reading the docs on Spatialite (the spatial extension for SQLite) at http://www.gaia-gis.it/ the last couple days. It has both a "spam" TLD and a design from 1998.


But not many of the official government ones.


official government in Italy also means cities, towns, hospitals, universities, public schools etc

There are 8 thousands towns in Italy, each with their own .it website.


In addition to this if one runs unbound as their DNS on their home router and they block DoH then one could add

    local-zone: "it" always_nxdomain
to NXDOMAIN all requests for the .it TLD and protect non browser devices. I use this method to stay off sanctioned country TLD's and to remove the cheap/free spammy domains and TLD's that often contain more malware than anything useful.


What’s this useragent switcheroo?


Browsers and other programs can use the User-Agent[1] header to send along a bit of information about themselves with each request.

This and other information is then used to filter out various types of visitor.

In this case, requests claiming to be a Google Search crawler will receive a boring page with lots of text that it can index and use as search results.

Most browsers' devtools let you change your user-agent string, and a listing of the ones used by Google crawlers is publicly available. Not saying that you should, but you could check this out for yourself... entirely at your own risk of course :)

https://en.wikipedia.org/wiki/User_agent

https://developers.google.com/search/docs/advanced/crawling/...


Or use Brave search, which honestly from my experience is much better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: