I am recreating the entire standard C library and I'm working on an implementation for strle
n that I would like to be the basis of all my other str
functions.
My current implementation is as follows:
int ft_strlen(char const *str)
{
int length;
length = 0;
while(str[length] != '\0' || str[length + 1] == '\0')
length++;
return length;
}
My question is that when I pass a str
like:
char str[6] = "hi!";
As expected, the memory reads:
['h']['i']['!']['\0']['\0']['\0']['\0']
If you look at my implementation, you can expect that I would get a return of 6 - as opposed to 3 (my previous approach) so that I can check strlen
potentially including extra allocated memory.
The catch here is that I will have to read outside of initialized memory by 1 byte to fail my last loop condition at final null terminator - which is the behavior I WANT. However this is generally considered bad practice and by some an automatic error.
Is reading outside of your initialized value a bad idea even when you are very specifically intending to read into a junk value (to ensure it DOES NOT contain '\0')?
If so, why?
I understand that:
"buffer overruns are a favorite avenue for attacking secure programs"
Still, I can't see the problem if I'm simply trying to ensure I've hit the end of initialized values...
Also, I realize this problem can be avoided - I have already sidestepped with a value set to 1 and then only reading initialized values - that's not the point, this is more of a fundamental question about C, runtime behavior and best practices ;)
[EDITS:]
Comment to previous post:
OK. Fair enough - but as to the question "Is it always a bad idea (danger from intentional manipulation or runtime stability) to read after initialized values" - do you have an answer? Please read the accepted answer for an example of the nature of the question. I really don't need this code fixed, nor do I need a better understanding of data types, POSIX specs or common standards. My question is related to WHY such standards may exist - why it may be important to never read past initialized memory (if such reasons exist)? What is the potential fallout of reading past initialized values IN GENERAL?
Please all - I'm trying to better understand aspects of how systems operate and I have a VERY SPECIFIC question.
Reading uninitialized memory can return data previously stored there. If your program processes sensitive data (such as passwords or cryptographic keys) and you disclose the uninitialized data to some party (expecting that it is valid), you might reveal confidential information.
Furthermore, if you read beyond the end of an array, the memory might not be mapped, and you will get a segmentation fault and a crash.
The compiler can also assume that your code is correct and will not read uninitialized memory, and make optimization decisions based on that, so even reading uninitialized memory can have arbitrary side effects.