Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's funny to see all these trends come and go for such a fundamentally simple problem. I'm still of the opinion that strcpy() with a good calculation of lengths is the simplest on the occasion that you truly do need to copy a string, which IMHO is often done far more than really necessary. The main point is that the length calculation and check should've been done before the copy; if you find out only when you do the copy that it's too long, then it's already too late.

An "anti-pattern" I seem to be seeing increasingly often in newer code is that of allocating a string (or worse, a dynamically expanding buffer), copying several other strings to it, then only calling another function with that concatenated copy before freeing it, when the other function could've simply been called several times with the individual parts successively.



strlcpy's flaw is its return value. In order to get the length of the string it has to walk the entire thing. If you are copying 20 character strings out of a 20TB mmaped file it will be outrageously slow and that's an unnecessary footgun.

It should have just returned the number of bytes copied to dst if the string was successfully copied and a -1 with an errno of E2BIG if the dst was too small for the src. It would still do the copy and termination of course, and the programmer will know the length in this case because they specified it in the function call. Of course this is what strscpy does.

If you aren't sure about the length of the src buffer strlcpy can be a ticking time bomb. If you aren't sure if the src string is NULL terminated it's also a problem, which is especially bad since one of the big reasons to use strlcpy is to avoid buffer overruns. GTA Online suffered from outrageous load times due to this very same API quirk. This would also make it easy for the function to return -1 and set errno to EINVAL if you specify NULL for the src or dst.


I don’t understand. If you’re doing length checks before “strcpy()”, then you can easily call “memcpy()” instead. And.. memcpy() is faster because it can go word at a time. So, why would anyone ever need “strcpy()”?


With memcpy() you need to know the exact length, i.e. it needs to be kept around and passed to the point of the copy. With strcpy(), once you have determined that a string will always be below a maximum length, it is no longer necessary to retain its exact length until if/when it is actually needed by a copy.


In 2022, having 4 bytes to store the length of the string (or even 8 if you want to be fancy) is not onerous.

Like all things, never say "for all" or "always" but almost all strings can be buffers packed with a size in a struct like in the top comment.


> never say "for all" or "always"

The 4KB of RAM in my microcontroller would appreciate those extra 4 bytes (or 3) used for every string.


If you use a microcontroller with 4kb of ram you should know the length of all buffers you use so you don't need to store it. For NES programming, which has 2kb you don't program C at all...much less wasting cycles doing things like strlen. It's tedious but rom is bigger and you can "store" the length of things in the instruction themselves (ie., hardcoded lengths), whereas ram is saved for more important things like position of sprites or whatever.

I really hope people aren't seriously doing string parsing on microcontrollers.


But I thought we were talking about compilers here, and defining what a compiler should do when it comes across a string literal, and not about what people should or shouldn’t do with strings on 4KB RAM MCUs.

String parsing is sometimes necessary, and it could be as simple as formatting and sending logs lines through an UART.


Why you need 4 bytes of length when you have 4Kb of RAM? 1 byte should be enough for you.


If the 4 bytes are reserved by the C compiler then there isn’t much to do (unless there is a flag to control bytes for strings length).

One byte is the actual overhead for strings.


4KB of RAM doesn’t mean only 4KB of addressable space. (Even if it was, you’d expect a 16b size_t.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: