You can make it technologically impossible, but they can also come and arrest you just for using such technology. So its not really a technical problem, its a social/political one.
I don't understand this take. There is no real way in which a private person can make law enforcement "more expensive". The government can always find means as long as it is supported by a sufficiently big fraction of its people.
Sure, they won't go out and arrest all one million, but from an individual perspective it's basically security by obscurity.
Once that's the case, otherwise legal activities (e.g. protesting, or making political statements) run the risk of making you a target. Law enforcement can then punish you for your legal activity by selectively enforcing this other law.
The resulting situation is one where everyone knows to some extent "you better shut up if you know what's good for you", and puts a chilling effect on otherwise legal forms of civic engagement.
You might point out that there are already laws on the books that let them do this, but I'm sure they wouldn't mind another.
Privacy-conscious apps and communications tools need to be developed, and we need to build the consensus that privacy is important.
edit: Anyone know why Briar doesn't have the feature for known contacts to be a "courier" for other contacts?
Background: Briar is the encrypted messaging app that works over tor, local wifi and bluetooth. If Alice sends a message to Charles but she isn't connected, the app will hold it until it detects Alice and Charles are in proximity.
My desired feature: If Bob is a verified contact with both Alice and Charles, Briar should be able to hand the message from Alice to Bob, and then deliver it to Charles.
I don't think there's a way with a phone that people would actually be willing to use. At some point it has to be decrypted to be displayed to the user and there's always the chance there's a flaw somewhere in the stack from hardware to OS to app etc that will have a gap to exfiltrate the data.
Avoiding centralised services is generally a good start. You could also do something like encrypt any messages through PGP even if the service you're using is already "e2e encrypted" like iMessage or signal
There are no technical solutions to human problems. This has been explained over and over again, most famously in Randall Munro's XKCD comic where the secret police resort to hitting someone with a $5 wrench until they give up the password.
If you're in a repressive state and you're worried about your data being exfiltrated the best security practice of all is not to create records of illegal activity. If you have to store such material, don't keep it on a communications device, put it on an external storage device, hide it somewhere outside your home, and don't tell anyone about it.
They don't have to make it illegal. They can just create all kinds of barriers like only allowing government approved OSes for essential services, and then using custom software can become grounds for suspicion and subject you to searches, etc.
I'm certain this is the direction we are all heading, unfortunately.
Governments will sanction the major proprietary OSes and compel Apple, Google, Microsoft to participate in their surveillance programs, and those will have remote integrity attestation and will be the only hardware and software you will be able to use to access essential services and the internet as whole, most likely.
The usage of alternative software won't be outright illegal, but will get you on a watchlist. Like you said, they don't need to make other software illegal, just make circumventing the blocks illegal.
They can't arrest everyone, but, it's one more gray area thing that can and will be used against you should the government ever decide they have a bone to pick with you specifically so you can get away with it for a long time, until suddenly you don't.
Once you get oil, dust, or moisture on a surface, dust starts to build up. It's downhill from there. Hard surfaces are easier to keep clean, just don't leave them damp after cleaning, but completely dry and slippery.
The close folding furniture is probably great for holding back dust buildup.
Context size limits are usually the reason. Most websites I want to scrape end up being over 200K tokens. Tokenization for HTML isn't optimal because symbols like '<', '>', '/', etc. end up being separate tokens, whereas whole words can be one token if we're talking about plain text.
Possible approaches include transforming the text to MD or minimizing the HTML (e.g., removing script tags, comments, etc.).
Indeed, Safari's reader already upgrades to using the rendered page, but even it fails on more esoteric pages using e.g. lazy loaded content (i.e. you haven't scrolled to it yet for it to load); or (god forbid) virtualized scrolling pages, which offloads content out of view.
It's a big web out there, there's even more heinous stuff. Even identifying what the main content is can be a challenge.
And reader mode has the benefit of being ran by the user. Identifying when to run a page-simplifying action on some headlessly loaded URL can be tricky. I imagine it would need to be like: load URL, await load event, scroll to bottom of page, wait for the network to be idle (and possibly for long tasks/animations to finish, too)
If you're curious, a book The Master and His Emissary by Ian McGilchrist goes over many of the differences in "personality" of each hemisphere which is not as simple of a divide as is commonly heard. It is not so much a logic vs. feeling/art split but maybe more of an isolation/abstraction vs. broad/networked default mode for each.
TypeSpec is great, but if you're working with Rust and you're about to write a new project that will require an OpenApi spec sooner or later, I'd like to recommend a web framework that has spec generation baked in:
All you need to do is derive a trait on your response structs and in return you get an almost perfectly generated spec. Unions, objects, enums are first class citizens.
Also, if you're from coming from PHP, the controllers feel very much like symfony controllers.
P.s. Please do recommend an ORM that would feel closer to doctrine. I miss doctrine.
Every single time I see these scraping discussions I get the same thoughts:
Businesses use data from the user. The Business does additional crunching on that data to derive new interesting data for the user. Who owns the data? The user or the app?
At the very least the user partially owns the data and as such, I'd argue that the user should have the right to share the data between different applications however they see fit. However, businesses tend to think that they somehow have the legal (moral even?) right to keep that data in their walled gardens. For as long as this (imo unfair) stance is common, I think that data extraction by use of these anti-bot-bypassing technologies is fair game.
If the data is public or semi-public then chances are I consented to display that data there. I consented to it being used on that site, for the purpose of that site. Not for random other companies.
And most scraping isn't done by users. It's done by companies. For profit. Often for less than enlightened reasons.
LinkedIn is a good example: I want my data displayed to people on that job site. I don't want it harvested by every recruiter under the sun who will then spam me. I certainly don't want that data sold between those recruiters long after I deleted my account on LinkedIn. Tinder and sites like that are also an obvious example: yes it's (semi-)public, but I also wouldn't want it to be scraped and harvested by some company – I just want it to be shown temporarily to a limited set of people.
In general, I don't think people should have a moral right to decide where and how the data that they made public is used, or to decide if it can get scraped or not.
And, in general, I take the fact that you published something on the Internet as a tacit moral consentment for the rest of the world to use it how they want.
This comes with a couple of big asterisks, because (1) Copyright law exists, and I generally try to not break the law, even if I don't agree with it. But the discussion in this thread is mostly separate from copyright: for instance, I don't think a court would see someone scraping and redistributing data from someone's LinkedIn profile as a copyright infringement case.
And (2) because I think that in some specific cases, using published data can be morally wrong, but not as a general rule.
i somewhat agree; people volunteer it when posting anything online. but they also volunteer their advertising id on their phones (even if they dont know it) - just as they dont know (and dont care) they are the product when on websites like facebook
i feel the 'antibot' stuff is more related to the adtech industry vs site-scrapers - remember getting a dedicated server and having friends click on links just to pay for it? Geocities and all these free websites, the biggest costs were bandwidth and storage (not that its not now)
since the AI Boom, there's just more hype over people wanting 'credit' (or money) for something they posted on a forum X-units of time ago.
its called the World Wide Web for a reason.. keep it open, even if it is to 'a bot' - never know when somebody's 'bot software' is reading your webpage for somebody who has some disadvantage and needs assistance
The technology would also be killer in real estate markets. Imagine being able to take a video and give potential tenants the ability to see the apartment/house/etc in detail.
I cynically think it wouldn't be a killer, because real estate agents prefer to have photos carefully made with fish-eye lens that makes rooms look bigger.
I was about to throw some money down their way, but apparently the steam version does not work on Mac M1/M2. I'm too young/uninitiated/whatever/visually_oriented to play the older version.
I bought the Steam version and run it in Wine on both an Intel and M1 Mac. I ended up refunding the Steam version and bought it on itch.io so I could just directly download the game files and updates instead of having to mess with Steam inside Wine (the Mac Steam does not allow you to download Windows files). Also you get a Steam key with itch.io, so when they release a proper Mac version I can add it to my library still.
But you need the actual files downloaded obviously. So you either have to have a PC where you can download the game files from Steam (Windows VM works too) and then copy over to your Mac. Or you can buy the game from itch.io (it's the same but you also get self-extracting archive + you get a Steam key as well) https://kitfoxgames.itch.io/dwarf-fortress
Is there really no way we can make it technologically impossible for them to exfiltrate user data?