Also it's not like dplyr is anything close to a "port" of SQL. You could in theory collect dplyr verbs and compile them to SQL, sure. That's what ORMs typically do, and what the Spark API does (and its descendants such as Polars).
"Porting" SQL to your language usually means inventing a new API for relational and/or tabular data access that feels ergonomic in the host language, and then either compiling it to SQL or executing it in some kind of array processing backend, or DataFusion if you're fancy like that.
dplyr straightforwardly transpiles to SQL through the dbplyr package, so it's semantically pretty close to a port, even though the syntax is a bit different (better).
> There's no reason to invent a completely new API for them
Yes there is: SQL is one of many possible ways to interact with tabular data, why should it be the only one? R data frames literally pioneered an alternative API. Dplyr is fantastic for many reasons, one of those being that people like the verb-based approach
Furthermore I argue that dplyr is not particularly similar to SQL in the way you actually use it and how it's actually interpreted/executed.
As for the rest I feel like you're just stating your preferences as fact.
data.table "simplicity" is actually a huge set of features, they just have a clever and compact way to express those features in code. At the same time, there is effectively no standard-eval programmatic interface for it, which makes it a headache for building programs rather than scripting with. data.table is amazing, but it is anything but simple IMO.
I hate to be the "you're holding it wrong" guy but 90% of "Pandas bad!" posts I find are either outright misinformed or mischaracterizing one person's particular opinion as some kind of common truth. This one is both!
> That comes from the assumption that there is almost always a meaningful index (timestamps)
The index can be literally any unique row label or ID. It's idiosyncratic among "data frames" (SQL has no equivalent concept, and the R community has disowned theirs), but it's really not such a crazy thing to have row labels built into your data table. Excel supports this in several different ways (frozen columns, VLOOKUP) and users expect it in just about any table-oriented GUI tool.
> having to write index=False every single time you write to disk
If you're actually using the index as it's meant to be used, you'd see why this isn't the default setting.
> functions seemingly randomly returning dataframes with column data as the index
I assume you're talking about the behavior of .groupby() and .rolling()? It's never been random. Under-documented and hard to reason about group_keys= and related options, yes. But not random.
> appending the index to the Series numpy data leading to incredibly confusing bugs
I've been using Pandas professionally almost daily since 2015 and I have no idea what this means.
I think the commenter you are replying to might well understand these nuances. The point is not that Pandas is inscrutable, but instead that it‘s annoying to use in many common use-cases.
It's a rhetorical device that dates back to the ancient Greeks (meiosis). It's absolutely a lot more writing to enumerate the ways in which Elon Musk is problematic.
In a sane world it would read that way. Unfortunately, we live in a world where such nondescript descriptors (“problematic”, “objectionable”, “unprofessional”, “toxic”, “extremist”, “far-$SIDE”, a few others depending on usage) have been used, and overused, to accuse or smear people without taking on much of a burden of proof or making any statements specific enough to be falsifiable.
They now provoke instinctive revulsion when used in culture-war-adjacent contexts even when, as here, their usage is entirely legitimate (you presuppose a vague but mutually understood allegation rather than nebulously introducing a fresh one). I think only “controversial” has escaped this fate, but it might be too weak for your purposes.
(To be clear, I am only trying to explain why your phrasing might cause your interlocutor to momentarily recoil even when—as in my case—they don’t actually have any problem with the contents of your statement. What you do with this explanation is up to you: I don’t believe these terms are short-term salvageable at this point, but neither will I begrudge others their choice of hopeless cause; I certainly have my own fair share of those.)
The problem is not that it's finite, the problem is that by the time prices rise enough to discourage people from using it frivolously, you might already be dangerously low on it.
Don't count on it. There's a lot of money in killing other businesses, or even just keeping prices high. Even if the high prices are an accident, there is always someone looking to take advantage of any situation for profit.
I have to agree. You only have to look at car and junk food inflation from after covid.
The prices make no sense, but that doesn't matter, they got away with it and are fighting to hold onto high prices, even as consumers balk. Their solution? Ditch poorer consumers. New cars and (branded) junk foods are luxury items now, apparently.
It's one thing if they have a shadow profile on you (and dozens of companies almost certainly do), but it's another thing if you give them meaningful info about you to enrich that profile with. They can figure out roughly what block you live on, OK fine, but unless you're in a rural area with no neighbors they might not be able to do much better than that.
> They can figure out roughly what block you live on
Its nothing to do with the specific house you live in, and everything to do with the activity being grouped together with all other activity you have done, which they know from fingerprinting and IP addresses.
They dont need to know where you live to have a very accurate personal and psychological profile opn you, and switching browsers is not going to help that in the slightest Im afraid.
Yes and no. If you block Linkedin SDK scripts on 3rd party sites, it's likely that Linkedin specifically doesn't actually have a good profile on you.
Realistically you're probably exposed and identified. But if you're meticulous and careful, you might not be, or at least not as completely as someone who is unaware or not careful. But it's not at all the same as if, say, a state actor was motivated to spy on you specifically.
Almost certainly they are using that for audience segmentation and ad targeting. Clever and disgusting. This isn't the invention of some evil moustache-twirling executive, this was the invention of an employee or group of employees who value money more than morals. We should think of such employees as henchmen.
if they do a better job at showing me an ad that might be relevant to me, how is that disgusting? if I have to see an ad at all I at least want them to give it their best shot
I cant believe that people still have the attitude that the trillions of dollars being invested in all this technology and tracking is just to give them a more relevant ad.
Do people really not remember scandals like Cambridge Analytica, and realise that these ads combined with social media feeds can be used to literally control and manipulate peoples decisions and behavoir?
Theres a reason Facebook and Youtube just got sued for being intentionally addictive attention machines.
You're glossing over the nuance of the Cambridge Analytica scandal or at least I don't see how it's connected here.
Facebook was a party, but not the protagonist.
- a Cambridge researcher (Aleks Kogan) created a personality quiz FB app advertised as academic research
- users had to consent to download the app
- the app nefariously scraped users' friends' data (300k users unlocked 87 million users' data)
- the information was sold to Cambridge Analytica
- who then used the information to profile American voters
LinkedIn already has all of this information from the information you feed it. Scanning for more information provides more refined views, but LinkedIn already has your graph.
> if they do a better job at showing me an ad that might be relevant to me, how is that disgusting?
To me that signalled that the author of the comment doesnt really care what is gonig on behind the scenes if the result is a better and more relevant ad.
I see this attitude often from people who dont seem to understand the severity and seriousness of online tracking which leads to psychological profiling which leads to manipulation.
> who then used the information to profile American voters
You seem to have missed off the most serious bit at the end.
Cambridge Analytica then used the data to profile millions of voters, and purposefully target divisive and flammable political material to specific suggestible people in order to manipulate outcomes.
This same thing is done all the time by all tracking and ad companies. I think this thread has gone beyond just LinkdIn scanning your browser extensions.
I agree that it could come off as gross negligence to not care about what happens with your data.
My point is that LinkedIn already has enough information (We've willingly given them!) to manipulate outcomes and if they're doing something nefarious, then it's already too late.
Whereas Cambridge Analytica involved bad actors (not Facebook) duping customers and re-selling their data. I don't think those elements are necessarily in play here.
is the manipulation of decisions and behavior not just a way of saying sales and marketing? I agree that it def can be used for bad things, but so can most tools/systems
Imagine if someone was following you around with a clipboard writing down everything you do, then rifling through your bookshelf to make note of certain books on the bookshelf, and then using that to target ads at you.
You'd say that's a ridiculous and illegal thing to do without you explicit consent, right?
Maybe you personally don't mind and would be happy to offer that consent. But they're doing it without your consent, regardless of whether you want it or not.
$$$, one of the classic bad faith motives. Most of tech nowadays is subsidized by advertising and profiling to some degree, often quite a large degree.
Aside from the fact that no one is asking for that, there is no law that prevents that ad targeting data from being sold to the government for the purposes of…whatever they want.
It's not just about ads. The same data and tech is also about locking you up and identifying you for deportation you if this admin thinks you are in the USA without permission.
And laundering responsibility. If the government uses a contractor to identify deportation candidates using this data, and they get it wrong, the government can at least try to shrug it off and blame the contractor, whose job is in part to absorb public outrage for these sorts of things. Whereas if the FBI wiretaps you and still gets it wrong, it's a lot harder to deflect blame.
"Porting" SQL to your language usually means inventing a new API for relational and/or tabular data access that feels ergonomic in the host language, and then either compiling it to SQL or executing it in some kind of array processing backend, or DataFusion if you're fancy like that.
reply