> One place where useful optimizations could be made are inline functions which take block parameters, since the optimizer is able to improve the inlined code based on the calling code. However, as far as I know, no current blocks-capable compilers perform any of these optimizations, although I haven't investigated it thoroughly.
> [...]
> for_each is a template function, which means that it gets specialized for this particular type. This makes it an excellent candidate for inlining, and it's likely that an optimizing compiler will end up generating code for the above which would be just as good as the equivalent for loop.
This is somewhat unfair, as it seems to reflect the common misconception that (e.g.) C++ sort is faster than C qsort because it uses templates, when in fact qsort would be just as fast if its implementation were written in the .h file, as C++ sort's is. Compilers are perfectly capable of inlining calls to function pointers.
Calls to blocks should be able to be inlined too, but I guess they're still a new feature; I did a quick test, and it seems that gcc cannot inline them, but llvm-gcc and clang can.
In this case, the article suggests:
[array do:^(id obj) {
NSLog(@"Obj is %@", obj);
}];
Objective-C message calls are never inlined since they are dynamic, but if you write something like:
> Compilers are perfectly capable of inlining calls to function pointers.
Does this happen even if the function that calls the callable (in this case sort) is not inlined? This would mean that the compiler generates a specialized function, say sort_{somerandomnumber} with the user-defined callable inlined into, and this seems kind of unlikely to me (while with templates the compiler is forced to do so).
In both your example the body of the caller (array do and for_each) is so small that I assume it is inlined.
No, but you could get the same effect by writing a short wrapper function and getting sort inlined into that. (Might be difficult in some cases, but I could imagine a "qsort_efficient" which is declared __attribute__((always_inline)).)
> [...]
> for_each is a template function, which means that it gets specialized for this particular type. This makes it an excellent candidate for inlining, and it's likely that an optimizing compiler will end up generating code for the above which would be just as good as the equivalent for loop.
This is somewhat unfair, as it seems to reflect the common misconception that (e.g.) C++ sort is faster than C qsort because it uses templates, when in fact qsort would be just as fast if its implementation were written in the .h file, as C++ sort's is. Compilers are perfectly capable of inlining calls to function pointers.
Calls to blocks should be able to be inlined too, but I guess they're still a new feature; I did a quick test, and it seems that gcc cannot inline them, but llvm-gcc and clang can.
In this case, the article suggests:
Objective-C message calls are never inlined since they are dynamic, but if you write something like: llvm-gcc and clang are able to generate code equivalent to doing the for loop directly.