More

trws · 2026-03-12T16:25:34 1773332734

Everything else in the siblings is true, but remember that the language and std types in rust all do this already. Most of the time it’s better to use a native enum or optional/result because they do this in the compiler/lib. It’s only really worth it if you need more than a few types or need precise control of the representation for C interop or something.

VorpalWay · 2026-03-12T18:18:44 1773339524

To expand on the sibling answer: sort of! Rust will do niche optimisation, but for references and NonNull pointers this is limited to "the value 0 is invalid and can thus be used as a niche". But Rust does not (currently) take advantage of alignment niches in pointers. Nor does it use high bit on architectures where you know your whole theoretical address space isn't actually in use.

Is doing that manually worth it? Usually not, but for some core types (classical example is strings) or in language runtimes it can be.

Would it be awesome if this could be done automatically? Absolutely, but I understand it is a large change, and the plan is to later build upon the pattern types that are currently work in progress (and would allow you to specify custom ranged integer typed).

tialaramex · 2026-03-12T17:23:03 1773336183

I mean, kinda, sorta? Rust's guaranteed niche optimisation means Option<&T> [which might be Some(&T) or just None] is promised to be the same size in memory as &T the reference to a T

So that's one tiny use of this sort of idea which is guaranteed unnecessary in Rust, and indeed although it isn't guaranteed the optimiser will typically spot less obvious opportunities so that Option<Option<bool>> which might be None, or Some(None) or Some(Some(true)) or Some(Some(false)) is the same size (one byte) as bool.

But hiding stuff in a pointer is applicable in places your Rust compiler won't try to take advantage unless you do something like this. A novel tiny String-like type I saw recently does this, https://crates.io/crates/cold-string ColdString is 8 bytes, if your text is 8 or fewer bytes of UTF-8 then you're done, that'll fit, but, if you have more text ColdString allocates on the heap to store not only your text but also its length and so it needs to actually "be" in some sense a raw pointer to that structure, but if the string is shorter that pointer is nonsense, we've hidden our text in the pointer itself.

Implementation requires knowing how pointers work, and how UTF-8 encoding works. I actually really like one of the other Rust tiny strings, CompactString but if you have a lot of very small strings (e.g. UK postcodes fit great) then ColdString might be as much as three times smaller than your existing Rust or C++ approach and it's really hard to beat that for such use cases.

Edited: To remove suggestion ColdString has a distinct storage capacity, this isn't intended as a conventional string buffer, it can't grow after creation

trws · 2026-03-12T15:23:53 1773329033

There’s a paper in flight to add a stdlib type to handle pointer tagging as well while preserving pointer provenance and so-forth. It’s currently best to use the intptr types, but the goal is to make it so that an implementation can provide specializations based on what bits of a pointer are insignificant, or even ignored, on a given target without user code having to be specialized. Not sure where it has landed since discussion in SG1 but seemed like a good idea.

tialaramex · 2026-03-12T15:51:38 1773330698

Given you aren't sure since SG1 this might be useless but... do you have a paper number? Or, more likely, know an author's name ?

trws · 2026-03-12T16:23:46 1773332626

It’s Hana Dusikova’s paper IIRC.

legobmw99 · 2026-03-12T16:19:50 1773332390

Seems like its p3125r0

tialaramex · 2026-03-12T16:57:35 1773334655

Thanks! https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p31...

(is the current version of that paper, the tracking ticket insisted there's a P3125R5 and that LEWG had seen it in 2025, but it isn't listed in a mailing so it might be a mirage)

You know it's a Hana paper because it wants this to be allowed at compile time (C++ constrexpr) but joking aside this seems like a nice approach for C++ which stays agnostic about future implementation details.

trws · 2025-11-28T15:18:32 1764343112

I worked @fzakaria on developing that idea. It actually worked surprisingly well. The benefits are mostly in the ability to analyze the binary afterward though rather than any measurable benefit in load time or anything like that though. I don’t have the repo for the musl-based loader handy, but here’s the one for the virtual table plugin for SQLite to read from raw ELF files: https://github.com/fzakaria/sqlelf

trws · 2025-09-23T23:15:24 1758669324

I liked the article. I saw your PS that we added it to the working draft for c++26, we also made it part of OpenMP as of 5.0 I think. It’s sometimes a hardware atomic like on arm, but what made the case was that it’s common to implement it sub-optimally even on x86 or LL-SC architectures. Often the generic cas loop gets used, like in your lambda example, but it lacks an early cutout since you can ignore any input value that’s on the wrong side of the op by doing a cheap atomic read or just cutting out of the loop after the first failed CAS if the read back shows it can’t matter. Also can benefit from using slightly different memory orders than the default on architectures like ppc64. It’s a surprisingly useful op to support that way.

If this kind of thing floats your boat, you might be interested in the non-reading variants of these as well. Mostly for things like add, max, etc but some recent architectures actually offer alternate operations to skip the read-back. The paper calls them “atomic reduction operations” https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p31...

anematode · 2025-09-24T02:09:39 1758679779

Curious: even with hardware atomics, wouldn't it be a good idea to first perform a non-atomic load to check for whether the store might be necessary (which would require the cache line to be locked), then only run the atomic max if it might change the value?

adwn · 2025-09-24T06:19:36 1758694776

Yes, this can make sense if

- the value is often doesn't require an update, and

- there's contention on the cache line, i.e., at least two cores frequently read or write that cache line.

But there are important details to consider:

1) The probing load must be atomic. Both the compiler and the processor in general are allowed to split non-atomic loads into two or more partial loads. Only atomic loads – even with relaxed ordering – are guaranteed to not return intermediate or mixed values from other atomic stores.

2) If the ordering on the read part of the atomic read-modify-write operation is not relaxed, the probing load must reflect this. For example, an acq-rel RMW op would require an acquire ordering on the probing read.

anematode · 2025-09-24T06:39:04 1758695944

Thanks for your insights. (2) makes sense to me, but for (1), on ARM64 can an aligned 64-bit store really tear in a 64-bit non-atomic load? The spec says "A write that is generated by a store instruction that stores a single general-purpose register and is aligned to the size of the write in the instruction is single-copy atomic" (B2.2.1)

adwn · 2025-09-24T09:18:41 1758705521

> […] on ARM64 […]

Well, if you target a specific architecture, then of course you can assume more guarantees than in general, portable code. And in general, a processor might distinguish between non-atomic and relaxed-atomic reads and writes – in theory.

But more important, and relevant in practice, is the behavior of the compiler. C, C++, and Rust compilers are allowed to assume that non-atomic reads aren't influenced by concurrent writes, so the compiler is allowed to split non-atomic reads into smaller reads (unlikely) or even optimize the reads away if it can prove that the memory location isn't written to by the local thread (more likely).

anematode · 2025-09-24T21:17:28 1758748648

Sure, no doubt a non-atomic load would be dangerous to write in C, C++, or Rust rather than in assembly

adgjlsfhk1 · 2025-09-24T04:14:20 1758687260

This depends heavily on what concurrency optimizations your processor implements (and unfortunately this is the sort of thing that doesn't get doccumented and is somewhat hard to test).

anematode · 2025-09-24T05:01:56 1758690116

I did a little unscientific test here on an Apple M4 Pro with n threads spamming atomic operations with pseudorandom values on one memory location (the worst case). Used inline asm to make sure there was no funny business going on.

  atomic adds
  n = 1 ->  333e6 adds/second
  n = 2 ->  174e6
  n = 4 ->   95e6
  n = 8 ->   63e6

  atomic maxs
  n = 1 ->  161e6 maxs/second
  n = 2 ->   59e6
  n = 4 ->   39e6
  n = 8 ->   27e6

  atomic maxs with preceding check
  n = 1 ->  929e6 maxs/second
  n = 2 -> 1541e6
  n = 4 -> 3494e6
  n = 8 -> 5985e6

So evidently the M4 doesn't do this optimization. Of course if your distribution is different you'd get different results, and this level of contention is unrealistic, but I don't see why you'd EVER not do a check before running atomic max. I also find it interesting that atomic max is significantly slower than atomic add

thequux · 2025-09-24T05:17:40 1758691060

I think that this can change the semantics though; with the preceding check you can miss the shared variable being decremented from another thread. In some cases, such as if the shared value is monotonic, this is done, but not in the general case.

anematode · 2025-09-24T05:50:22 1758693022

With a relaxed ordering I'm not sure if that's right, since the ldumax would have no imposed ordering relation with the (atomic) decrement on another thread and so could very well have operated on the old value obtained by the non-atomic load

gpderetta · 2025-09-24T08:23:58 1758702238

All operations on a single memory location are always totally ordered in a CC system, no matter how relaxed the memory model is.

Also am I understanding it correctly that n is the number of threads in your example? Don't you find it suspicious that the number of operations goes up as the thread count goes up?

edit: ok, you are saying that under heavy contention the check avoids having to do the store at all. This is racy, and whether this is correct or not, would be very application specific.

edit2: I thought about this a bit, and I'm not sure i can come up with a scenario where the race matters...

edit3: ... as long as all threads are only doing atomic_max operations on the memory location, which an implementation can't assume.

Dylan16807 · 2025-09-24T10:20:58 1758709258

> as long as all threads are only doing atomic_max operations on the memory location, which an implementation can't assume.

What assumes that?

If your early read gives you a higher number, quitting out immediately is the same as doing the max that same nanosecond. You avoid setting a variable to the same value it already is. Doing or not doing that write shouldn't affect other atomics users, should it?

In general, I should be able to add or remove as many atomic(x=x) operations as I want without changing the result, right?

And if your early read is lower then you just do the max and having an extra read is harmless.

The only case I can think of that goes wrong is the read (and elided max) happening too early in relation to accesses to other variables, but we're assuming relaxed memory order here so that's explicitly acceptable.

gpderetta · 2025-09-24T10:45:11 1758710711

Yes, probably you are right: a load that finds a larger value is equivalent to a max. As the max wouldn't store any value in this case, also it wouldn't introduce any synchronization edge.

A load that finds a smaller value is trickier to analyze, but i think you are just free to ignore it and just proceed with the atomic max. An underlying LL/SC loop to implement a max operation might spuriously fail anyway.

edit: here is another argument in favour: if your only atomic RMW is a cas, to implement X.atomic_max(new) you would:

  1: expected <- X 
  2: if new < expected: done
  3: else if X.cas(expected, y): done
     else goto 2 # expected implicitly refreshed

So a cas loop would naturally implement the same optimization (unless it starts with a random expected), so the race is benign.

ibraheemdev · 2025-09-24T07:49:21 1758700161

It does make a difference of course if you're running fetch_max from multiple threads, adding a load fast-path introduces a race condition.

masklinn · 2025-09-24T09:16:15 1758705375

Does it tho? Assuming no torn reads/writes at those sizes, given the location should be strictly increasing are there situations where you could read a higher-than-stored value which would cause skipping a necessary update?

Afaik on all of x86, arm, and riscv an atomic load of a word sized datum is just a regular load.

gpderetta · 2025-09-24T10:47:00 1758710820

It doesn't need to be strictly increasing some other thread could be making other arbitrary operations. Still even in that case, as Dylan16807 pointed out, it likely doesn't matter.

masklinn · 2025-09-24T14:17:22 1758723442

> It doesn't need to be strictly increasing some other thread could be making other arbitrary operations

We're talking about collating a maximum, by definition every write to that is an increase.

gpderetta · 2025-09-24T14:19:53 1758723593

If you are implementing a library function atomic<T>::fetch_max, you cannot assume that every other thread is also performing a fetch_max on that object. There might be little reason for it, but other operations are allowed so the the sequence of modifications might not be strictly increasing (but then again, it doesn't matter for this specific optimization).

SkiFire13 · 2025-09-24T05:58:57 1758693537

> but it lacks an early cutout since you can ignore any input value that’s on the wrong side of the op by doing a cheap atomic read or just cutting out of the loop after the first failed CAS if the read back shows it can’t matter.

I believe this is a bit trickier than that, you would also need at least some kind of atomic barrier to preserve the ordering semantics of the successful update case.

trws · 2025-09-19T22:36:12 1758321372

The other place it comes up is launchers and resource managers. We actually have a series of old issues and implementation work on flux (large scale resource manager for clusters) working around fork becoming a significant bottleneck in parallel launch. IIRC it showed up when we had ~1gb of memory in use and needed to spawn between 64 and 192 processes per node. That said, we actually didn’t pivot to vfork, we pivoted to posix_spawn for all but the case where we have to change working directory (had to support old glibc without the attr for that in spawn). If you’re interested I think we did some benchmarking with public results I could dredge up.

Anyway, much as I have cases where it matters I guess what I’m saying is I think you’re right that vfork is rarely actually necessary, especially since you’d probably have a much easier time getting a faster and still deterministic spawn if it ever actually becomes a bottleneck for something you care about.

cryptonector · 2025-09-20T05:00:23 1758344423

> That said, we actually didn’t pivot to vfork, we pivoted to posix_spawn for all but the case where we have to change working directory (had to support old glibc without the attr for that in spawn).

You can always accomplish that sort of thing by using a helper program that ultimately execs the desired one -- just prefix it and its arguments to the intended argv.

trws · 2025-09-20T19:49:02 1758397742

Quite so. We would have too, but I left out the nasty bit that someone had at one point put a callback argument in an internal launching API that runs between fork and exec. Still working on squashing the last of those.

cryptonector · 2025-09-20T03:01:38 1758337298

> The other place it comes up is launchers and resource managers.

Yes, and typically for the same reasons as in JVMs etc: `vfork()` is just faster.

trws · 2025-08-26T21:57:06 1756245426

I largely agree, and use these patterns in C, but you’re neglecting the usual approach of having a default or stub implementation in the base for classic OOP. There’s also the option of using interfaces in more modern OOP or concept-style languages where you can cast to an interface type to only require the subset of the API you actually need to call. Go is a good example of this, in fact doing the lookup at runtime from effectively a table of function pointers like this.

ryao · 2025-08-27T13:57:27 1756303047

My point is that this pattern is not object oriented programming. As for a default behavior with it, you usually would do that by either always adding the default pointer when creating the structure or calling the default whenever the pointer is NULL.

In the Linux VFS for example, there are optimized functions for reading and writing, but if those are not implemented, a fallback to unoptimized functions is done at the call sites. Both sets are function pointers and you only need to implement one if I recall correctly.

f1shy · 2025-08-27T14:46:45 1756306005

To be fair, OOP is not 100% absolutely perfectly defined. Strustrup swears C++ is OOP, Alan Key, at least at some point laughed at C++, and people using CLOS have yet another definition

pjmlp · 2025-08-28T12:25:42 1756383942

You forgot about people using BETA, or Self, or ......

naasking · 2025-08-27T15:00:19 1756306819

> My point is that this pattern is not object oriented programming.

I think the "is/is not" question is not so clear. If you think of "is" as a whether there's a homomorphism, then it makes sense to say that it is OOP, but it can qualify as being something else too, ie. it's not an exclusionary relationaship.

ryao · 2025-08-27T18:47:40 1756320460

Object oriented programming implies certain contracts that the compiler enforces that are not enforced with data abstraction. Given that object oriented programming and data abstraction two live side by side in C++, we can spot the differences between member functions that have contracts enforced, and members function pointers that do not. Member functions have an implicit this pointer, and in a derived class, can call the base class version via a shorthand notation to the compiler (BaseClass::func() or super()), unless that base class version is a pure virtual function. Member function pointers have no implicit this pointer unless one is explicitly passed. They have no ability to access a base class variant via some shorthand notation to the compiler because the compiler has no contract saying that OOP is being done and there is a base class version of this function. Finally, classes with unimplemented member functions may not be instantiated as objects, while classes with unimplemented member functions pointers may.

If you think of the differences as being OOP implies contracts with the compiler and data abstraction does not (beyond a simple structure saying where the members are in memory), it becomes easier to see the two as different things.

1718627440 · 2025-08-27T19:07:04 1756321624

So you can opt out or in to syntactic sugar, that makes C++ an interesting and useful language, but how you implement OOP, doesn't really affect if it is OOP.

ryao · 2025-08-27T20:01:03 1756324863

By this logic, C is an objective oriented language. It is widely held to not be. That is why there were two separate approaches to extend it to make it object oriented, C++ and Objective-C.

1718627440 · 2025-08-27T20:19:05 1756325945

You can implement OOP in C as you can in any language, the article is an example of this. C is not an OOP language in any way, it doesn't have any syntactic features for it and use the term "object" for something different.

ryao · 2025-08-28T00:47:45 1756342065

The article mentions file_operations, but ignores that it has what would be a static member function in C++ in the form of ->check_flags(), which is never in a vtable. The article author is describing overlap between object oriented programming and something else, called data abstraction, which is what is really being done inside Linux, and calling it OOP.

You can implement OOP in C if you do vtables for inheritance hierarchies manually, among other things, but that is different than what Linux does.

1718627440 · 2025-08-28T16:16:31 1756397791

I honestly don't think how a C++ compiler chooses to implement an object method does matter here.

It's a function belonging to an object, to which is dynamically dispatched with something I would call a vtable. To me that sounds like a classic example of OOP.

Data abstraction is a core of OOP.

This pattern can be used to implement inheritance, when it isn't here that doesn't mean its not OOP.

ryao · 2025-08-28T23:39:10 1756424350

Data abstraction is a separate invention from OOP since it involves abstract data types. What is being used here is an abstract data type. It is not the pattern used in OOP languages and it is not OOP. It bears similarities and overlap with the vtables used to implement some OOP languages. It is like how thumbs bear similarities and overlap with index fingers, but the two are not the same.

1718627440 · 2025-08-29T11:12:55 1756465975

To cite Wikipedia:

> Object-oriented programming (OOP) is a programming paradigm based on the object – a software entity that encapsulates data and function(s). An OOP computer program consists of objects that interact with one another. A programming language that provides OOP features is classified as an OOP language [...]

You don't disagree, that this kernel pattern is about data abstraction. You probably don't disagree, that the kernel uses functions. The kernel uses "objects" (FS implementations) that follow a defined set of functions, sometimes called "class" (vtables/wtables/however you like to call them). Therefore I conclude what the kernel does here is a prime example of OOP.

ryao · 2025-08-29T13:57:25 1756475845

You can use similar logic to declare English to be an example of Chinese. They both have syllables. They both assemble syllables into words that convey meaning. They both use grammars to form relationships between those words. Thus, they must be the same. It is fallacious logic. Some similarities do not make things the same. Data abstraction is also its own topic that is able to stand independently from OOP. What the kernel does is data abstraction, not OOP. What you are seeing in the kernel are the abstract data types of data abstraction.

1718627440 · 2025-08-29T20:16:37 1756498597

Sorry no, thus are very different, but of course that's not what you arguing for.

I know ADT and OOP are different concepts, in another answer I wrote what I think the base differences are. But they are related, and in the definitions I am familiar with, ADTs are a base concept for OOP. And OOP can be an implementation for ADTs.

If you don't think the implementations I provide are enough to apply to the kernel, can maybe provide your own definition according to which we can evaluate this, because I feel like we are beating around the bush. But please not something that says this can only be happening in an OOP language, to these I would plainly disagree with, because OOP is a paradigm and not a property of a language.

teo_zero · 2025-08-27T21:58:13 1756331893

> Object oriented programming implies certain contracts that the compiler enforces

Sorry, but where did you got this definition from? I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types? So you can think in a OO way even if the language has no notion of objects, methods, etc.

ryao · 2025-08-28T01:18:41 1756343921

> Sorry, but where did you got this definition from?

It is from experience with object oriented languages (mainly C++ and Java). Technically, you can do everything manually, but that involves shoehorning things into the OO paradigm that do not naturally fit, like the article author did when he claimed struct file_operations was a vtable when it has ->check_flags(), which would be equivalent to a static member function in C++. That is never in a vtable.

If Al Viro were trying to restrict himself to object oriented programming, he would need to remove function pointers to what are effectively the equivalent of static member functions in C++ to turn it into a proper vtable, and handle accesses to that function through the “class”, rather than the “object”.

Of course, since he is not doing object oriented programming, placing pointers to what would be virtual member functions and static member functions into the same structure is fine. There will never be a use case where you want to inherit from a filesystem implementation’s struct file_operations, so there is no need for the decoupling that object oriented programming forces.

> I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

It certainly can be, but it is not the only way.

> Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types?

This is an odd question. First, exactly what is a native type? If you mean primitive types, then yes. Even C++ does that. If you mean standard library compound types, again, yes. The C++ STL started as a third party library at SGI before becoming part of the C++ standard. If you mean types that you can define, then probably not without a bunch of pain, as then we are going back to the dark days of manually remembering offsets as people had to do in assembly language, although it is technically possible to do in both C and C++.

What you are asking seems to be exactly what data abstraction is, which involves making an interface that separates use and implementation, allowing different data structures to be used to organize data using the same interface. As per Wikipedia:

> For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list of (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case.

https://en.wikipedia.org/wiki/Abstraction_(computer_science)...

Getting back to doing data structures without object oriented programming, this is often done in C using a structure definition and the CPP (C PreProcessor) via intrusive data structures. Those break encapsulation, but are great for performance since they can coalesce memory allocations and reduce pointer indirections for objects indexed by multiple structures. They also are extremely beneficial for debugging, since you can see all data structures indexing the object. Here are some of the more common examples:

https://github.com/openbsd/src/blob/master/sys/sys/queue.h

https://github.com/openbsd/src/blob/master/sys/sys/tree.h

sys/queue.h is actually part of the POSIX standard, while sys/tree.h never achieved standardization. You will find a number of libraries that implement trees like libuutil on Solaris/Illumos, glib on GNU, sys/tree.h on BSD, and others. The implementations are portable to other platforms, so you can pick the one you want and use it.

As for “hash maps” or hash tables, those tend to be more purpose built in practice to fit the data from what I have seen. However, generic implementations exist:

https://stackoverflow.com/questions/6118539/why-are-there-no...

That said, anyone using hash tables at scale should pay very close attention to how their hash function distributes keys to ensure it is as close to uniformly random as possible, or you are going to have a bad time. Most other applications would be fine using binary search trees. It probably is not a good idea to use hash tables with user controlled keys from a security perspective, since then a guy named Bob can pick keys that cause collisions to slow everything down in a DoS attack. An upgrade from binary search trees that does not risk issues from hash function collisions would be B-trees.

By the way, B-trees are containers and cannot function as intrusive data structures, so you give up some convenience when debugging if you use B-Trees.

1718627440 · 2025-08-28T06:07:26 1756361246

> handle accesses to that function through the “class”, rather than the “object”

You don't need classes for OOP. C++ not putting methods that logically operate on an object, but don't need a pointer to it, into the automatically created vtable, is an optimization and an implementation detail. I don't know why you think that putting this function into a vtable precludes OOP.

Wait, how does inheritance work when the method is not in the vtable?

ryao · 2025-08-28T23:15:48 1756422948

The calling convention for C++ non-static member functions always includes a this pointer, even if the function does not use it. Removing it on member functions that do not use it would pose a problem if another class inherited from this class and overrode the function definition with one that did use it. Maybe in very special cases whole program optimization could safely remove the this pointer, but it is questionable whether any compiler author would go through the trouble given that the exception handling would need to know about the change. Outside of whole program optimization, it is unlikely removing this from member functions that do not use it would ever happen because it would break ABI stability.

As for how inheritance works when the member function is not in the vtable, that depends on what kind of member function it is. All C++ functions are given a mangled name that is stuffed into C’s infrastructure for linking symbols. For static member functions, inheritance is irrelevant since they are tied to the class. Calls to static member functions go directly to the mangled function with no indirections, just as if it had been a global function. For non-static virtual member functions, you use the vtable pointer to find it. For non-virtual member functions, the call goes straight to the function as if a global function had been called (and the this pointer is still passed, even if the function does not use it), since the compiler knows the type and thus can tell the linker to have calls there go to the function through the appropriately mangled name. It is just like calling a global function.

1718627440 · 2025-08-29T10:48:57 1756464537

> The calling convention for C++ non-static member functions always includes a this pointer, even if the function does not use it.

Yes. Since we are not in C++ we can choose to get rid of this useless pointer.

> Removing it on member functions that do not use it would pose a problem if another class inherited from this class and overrode the function definition with one that did use it.

That problem has nothing to do with the this pointer specifically. When you change the method signature of an inherited method you always have this problem. This simply means, that the superclass prescribes limits to subclasses, which is why it's possible to use a subclass inplace of a superclass.

> Maybe in very special cases whole program optimization could safely remove the this pointer, but it is questionable whether any compiler author

Yes, that's why its not done in C++, but we can do it, if we handroll it.

> it would break ABI stability

It does not if it has always been like this.

> For static member functions, inheritance is irrelevant since they are tied to the class. Calls to static member functions go directly to the mangled function with no indirections

In other words, ->check_flags() can't be implemented as a static member functions in C++. It would simply have a this pointer, that it just wouldn't use, since C++ has no way to express non-static member functions, that just don't take a this pointer.

> thus can tell the linker to have calls there go to the function

In our case the linker can only resolve the call to the appropriate vtable, since the type isn't known until runtime.

ryao · 2025-08-29T11:04:26 1756465466

> Yes. Since we are not in C++ we can choose to get rid of this useless pointer.

If you were trying to implement OOP in the kernel in C and implemented a vtable, you cannot get rid of the this pointer in vtable entries since a child class might want to use it in the overrode definition. It is one of the same reasons why you cannot remove it in C++. The entire point of a vtable is to enable inheritance. If OOP really were being done, an out of tree module could make a class that inherits from this one without needing any code changes and use the this pointer, but you cannot do that if you drop the this pointer. I already explained this.

1718627440 · 2025-08-29T11:51:24 1756468284

This is one interpretation. The other is that the interface of check_flags() specifies, that any implementation of it is only allowed to differ on the type of the object and not any other property.

You already prescribe with the chosen arguments in the superclass on which things the child implementation can depend. Why not also do this with the first argument?

ryao · 2025-08-29T13:51:06 1756475466

You would typically put the this pointer into the first argument when doing OOP in C. You can put the this pointer in the last argument to have it work too. However, you cannot omit it entirely. That is something that is not OOP. It is an ADT.

1718627440 · 2025-08-29T20:01:07 1756497667

So suppose you have it, but never use it. Then why have it you can just remove the first parameter. You can have object methods in C++ too, that don't use the this pointer.

Also why do you care exactly about the order of arguments? The nature of the function doesn't change, it's entirely arbitrary and orthogonal to the paradigm the function implements. Another example is the implementation of the equality operator between objects. In languages with syntactic sugar you typically have (self, other), but if its the true equality operator then the order doesn't matter.

1718627440 · 2025-08-27T17:56:24 1756317384

> My point is that this pattern is not object oriented programming.

Isn't this exactly how most (every?) OOP language implements it? You would say a C++ virtual method isn't OOP?

ryao · 2025-08-27T18:33:23 1756319603

Member function pointers and member functions in C++ are two different things. Member function pointers are not OOP. They are data abstraction.

The entire point of OOP is to make contracts with the compiler that forcibly tie certain things together that are not tied together with data abstraction. Member functions are subject to inheritance and polymorphism. Member function pointers are not. Changing the type of your class will never magically change the contents of a member function pointer, but it will change the constants of a non-virtual member function. A member function will have a this pointer to refer to the class. A member function pointer does not unless you explicitly add one (named something other than this in C++).

1718627440 · 2025-08-27T19:01:19 1756321279

Yeah, but the compiler implements these by adding vtables, propagating vtables values across inheritance hierarchies, adding another parameter.

You claim when the compiler does this, it's OOP, but when I do it, it's not?

dragonwriter · 2025-08-27T20:33:33 1756326813

Ìf you do it, it can still be OOP, its just not in an OO language. People have trouble separating using a paradigm and using a language focused on the paradigm, for some reason.

ryao · 2025-08-27T20:15:30 1756325730

The entire point of OOP in every OOP language that I have ever used has been to have the language try to constrain what you can do by pushing restrictions on syntactic sugar involving objects, inheritance and encapsulation, so I would say yes. The marketing claims that people will be more productive at programming by using these.

1718627440 · 2025-08-27T20:22:37 1756326157

Yes, you need to have that to have an OOP language. OOP is object-oriented _Programming_, it's about how you program, not what features the language has.

ryao · 2025-08-28T00:44:42 1756341882

In hindsight, I had your remark confused with another remark insisting that struct inode_operations is a vtable, despite it having what would be static member functions in C++, which are never in vtables, and there being no inheritance hierarchy. If you are disciplined enough to do what you said, then I could see that as being OOP, but the context here is of something that is not OOP and only happens to overlap with it. The article mentions file_operations, but ignores that it has what would be a static member function in C++ in the form of ->check_flags(), which is never in a vtable.

1718627440 · 2025-08-28T05:44:16 1756359856

I'm also thinking that these kind of vtables in the linux kernel are what would be implemented by the compiler in C++. But because its self-written, you can be much more creative and do other things, that weren't possible if this would be created by a compiler.

Of course you could implement the same in C++ and then it can't be the same as the vtable introduced by the compiler, so you would just end up with to vtables, you own and the one introduced by the compiler.

ryao · 2025-08-28T23:23:19 1756423399

If the kernel were written in C++, it would still be done the way it is done now. C++ does not allow unimplemented member functions and the ADTs currently used do. You can emulate that with multiple inheritance, but it is an inferior way of doing this.

As I said, these are NOT vtables. The fact that you and some others keep thinking of them as vtables misleads you into thinking that this can be done using the object oriented tools of C++. It cannot without major hacks and the result would be slower, harder to read and only something that a bureaucrat could like.

1718627440 · 2025-08-29T10:56:49 1756465009

If the kernel were written in C++, it simply had the incentive to be less creative. Since it isn't it can be. It's just a restriction imposed by C++, not a restriction in the loosely defined paradigm of OOP.

> As I said, these are NOT vtables

Ok, you just define vtables differently then me. To me a vtable is a table of virtual functions that are used to implement polymorphic behaviour of objects. This applies to their usage in the kernel and the article. Feel free to introduce a new term for this. If your only distinction is whether these are created by a compiler, this is just a distinction I don't care about.

ryao · 2025-08-29T14:14:56 1756476896

The article author is wrong. It happens. Draw a vent diagram with two partially overlapping circles. You and the author are looking at the overlap and concluding the two are the same. They are not, given the stuff outside the overlap.

As for the one distinction you recognize and think is invalid, that distinction is given by the definition you found. You refuse to obey the definition you yourself quoted to settle matter elsewhere in the thread.

1718627440 · 2025-08-29T20:59:26 1756501166

> As for the one distinction you recognize

It's not the only distinction I recognize, its the distinction you think matters here and that seams to be the basis for our disagreement.

This is the term (vtable, VMT) I got told in lectures to describe this pattern, you have yet not pointed me to a different term that you would recognize to be this, so in lack of a better term I will continue to use this.

As to why I think this distinction does not matter here, is because I perceive the compiler to be a tool that generates code which is controlled by the programmer. Thus the programmer in both cases creates codes with the same paradigm, they only differ in the tools used. We generally don't name things differently depending on which tools are used in the creation, except if they are created with a different intention.

trws · 2025-08-26T21:49:37 1756244977

You have this largely right, but I need to defend the Radeon driver a bit here. The driver that caused all the problems was the proprietary fglrx driver, not the open source Radeon driver. The issue with the Radeon driver wasn’t stability, it was that it was 2d acceleration only.

tremon · 2025-08-26T22:31:59 1756247519

it was 2d acceleration only

Not completely true either, it eventually supported most of the normal 3d primitives but gaming performance was never a priority because there were few developers and they weren't employed by AMD/ATI -- which also meant that some cards would only reach full feature support after their EOL, sadly.

The amdgpu also driver benefits from a lot of the groundwork that has been done since. The radeon driver is older than kernel features like KMS (kernel modesetting) and GEM (graphics execution manager), and the LLVM-based shader compiler in mesa (userspace). I'd say that the radeon driver was actually the proving ground for many of these features, because it was the most capable open source 3d driver: The Intel 845/915 hardware barely supported 3d operations, and the only 3d-capable open source driver for Nvidia was the reverse-engineered nouveau driver.

Luckily, many people working on the amdgpu driver are actually on AMD's payroll these days.

account42 · 2025-08-27T10:41:04 1756291264

AMD had developers working on radeon (the older open source kernel driver) and radeonsi (the open source user-space OpenGL driver backend for newer cards in Mesa that now sits on top of amdgpu) before the switch to amdgpu (the newer open source kernel driver). While the kernel driver isn't irrelevant for performance, it depends more on the user space portion (radeonsi and r600 before that) which was kept with the amdgpu switch. What the amdgpu driver brought is more sharing of display code with their windows drivers. The main difference in performance is between r600 (mostly developed without financial support from AMD) and radeonsi (mostly developed by AMD). Of course these days the most relevant user-space portion is radv (open source Vulkan driver in Mesa) which is NOT developed by AMD but rather funded by Valve (and at least initially Red Hat). There is also the open source amdvlk Vulkan user-space driver developed by AMD which is the same as their proprietary Vulkan driver except with the proprietary shader compiler swapped out for the same LLVM backend that radeonsi uses. And if this all wasn't confusing enough, AMD also calls the full driver package with the proprietary Vulkan driver and some snapshot of the open source OpenGL Mesa drivers (radeonsi) "amdgpu-pro".

chao- · 2025-08-26T22:54:07 1756248847

I remember! I stand corrected on the name and the issues!

I forgot that name "fglrx", probably a mental self-defense mechanism. Those were some bad times, trying to get different display outputs to work at the same time, guessing and testing values in xorg.conf, so on. There was some community utility someone wrote to try and help with installation, reinstallation, configuration and reconfiguration, but the name eludes me now.

I would edit my post to correct it, but it seems the edit window has passed.

trws · 2025-06-22T08:41:31 1750581691

I just started giving it a try again about a week ago, and I second this. A year ago it was nearly unusable for any extension outside their preferred list, now it’s largely a pleasant experience.

trws · 2025-06-12T05:57:00 1749707820

I’m rather hoping there’s something better, but various CAD formats support specifying assemblies of objects, and joints between those objects that can represent properties like that. Often this comes with at least some level of simulation, or if not simulation imposed constraints like in the FreeCAD assembly workbench, allowing you to move connected parts in the assembly but only through the range permitted by the “joint”. I quote that because that includes things like meshed gears, linear slides, ball joints, all kinds of things like that some of which I would not call joints as such.

imtringued · 2025-06-12T09:56:22 1749722182

Well, the problem is that FreeCAD is in the wrong here, but you are also making mistakes as well.

* The correct term for "slider joint" is "prismatic joint".

* "ball joint" should be "spherical joint" (nit picking, but still)

* "Revolute joint" and "cylindrical joint" are correct

Now comes the list of things which aren't joints and should be called constraints instead:

* Distance Joint

* Parallel Joint

* Perpendicular Joint

* Angle Joint

* Rack and Pinion Joint

* Screw Joint

* Gear Joint

* Belt Joint

Now to your mistakes. There is absolutely nothing wrong with calling revolute, prismatic and spherical joints joints. They are joints, they do what joints do, hence the name joint. The physical interface is your responsibility as the designer.

trws · 2025-06-10T09:51:41 1749549101

The short answer is yes, Linux can be informed to some extent but often you still want a memory balloon driver so that the host can “allocate” memory out of the VM so the host OS can reclaim that memory. It’s not entirely trivial but the tools exist, and it’s usually not too bad on vz these days when properly configured.