Search code examples
cmallocrealloccs50

Mystery of the mysterious P


Background:

I'm trying to create a program that takes a user name(assuming that input is clean), and prints out the initials of the name.

Objective:

  • Trying my hand out at C programming with CS50
  • Getting myself familiar with malloc & realloc

Code:

#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>

string prompt(void);
char *getInitials(string input);
char *appendArray(char *output,char c,int count);

//Tracks # of initials
int counter = 0;

int main(void){

    string input = prompt();
    char *output = getInitials(input);
    for(int i = 0; i < counter ; i++){

        printf("%c",toupper(output[i]));
    }


}

string prompt(void){

    string input;

    do{
        printf("Please enter your name: ");
        input = get_string();
    }while(input == NULL);

    return input;

}

char *getInitials(string input){

    bool initials = true;
    char *output;
    output = malloc(sizeof(char) * counter);
    for(int i = 0, n = strlen(input); i < n ; i++){
        //32 -> ASCII code for spacebar
        //9  -> ASCII code for tab
        if(input[i] == 32 || input[i] == 9 ){
            //Next char after spaces/tab will be initial
            initials = true;


        }else{//Not space/tab
            if(initials == true){
                counter++;
                output = appendArray(output,input[i],counter);

                initials = false;


            }        

        }
        // eprintf("Input[i] is : %c\n",input[i]);
        // eprintf("Counter is : %i\n",counter);
        // eprintf("i is : %i\n",i);
        // eprintf("n is : %i\n",n);



    }

    return output;
}

char *appendArray(char *output,char c,int count){

    // allocate an array of some initial (fairly small) size;
    // read into this array, keeping track of how many elements you've read;
    // once the array is full, reallocate it, doubling the size and preserving (i.e. copying) the contents;
    // repeat until done.


    //pointer to memory
    char *data = malloc(0);
    //Increase array size by 1
    data = realloc(output,sizeof(char) * count);
    //append the latest initial
    strcat(data,&c);
    printf("Value of c is :%c\n",c);
    printf("Value of &c is :%s\n",&c);
    for(int i = 0; i< count ; i++){
        printf("Output: %c\n",data[i]);
    }
    return data;

}

Problem:

The output is not what i expected as there is a mysterious P appearing in the output.

E.g When i enter the name Barack Obama, instead of getting the result:BO, i get the result BP and the same happens for whatever name i choose to enter, with the last initial always being P.

Output:

Please enter your name: Barack Obama
Value of c is :B
Value of &c is :BP
Output: B
Value of c is :O
Value of &c is :OP
Output: B
Output: P
BP

What i've done:

I've traced the problem to the appendArray function, and more specifically to the value of &c (Address of c) though i have no idea what's causing the P to appear,what it means, why it appears and how i can get rid of it.

The value of P shows up no matter when i input.

Insights as to why it's happening and what i can do to solve it will be much appreciated.

Thanks!


Solution

  • Several issues, in decreasing order of importance...

    First issue - c in appendArray is not a string - it is not a sequence of character values terminated by a 0. c is a single char object, storing a single char value.

    When you try to print c as a string, as in

    printf("Value of &c is :%s\n",&c);
    

    printf writes out the sequence of character values starting at the address of c until it sees a 0-valued byte. For whatever reason, the byte immediately following c contains the value 80, which is the ASCII (or UTF-8) code for the character 'P'. The next byte contains a 0 (or there's a sequence of bytes containing non-printable characters, followed by a 0-valued byte).

    Similarly, using &c as the argument to strcat is inappropriate, since c is not a string. Instead, you should do something like

    data[count-1] = c;
    

    Secondly, if you want to treat the data array as a string, you must make sure to size it at least 1 more than the number of initials and write a 0 to the final element:

    data[count-1] = 0; // after all initials have been stored to data
    

    Third,

    char *data = malloc(0);
    

    serves no purpose, the behavior is implementation-defined, and you immediately overwrite the result of malloc(0) with a call to realloc:

    data = realloc(output,sizeof(char) * count);
    

    So, get rid of the malloc(0) call altogether; either just initialize data to NULL, or initialize it with the realloc call:

    char *data = realloc( output, sizeof(char) * count );
    

    Fourth, avoid using "magic numbers" - numeric constants with meaning beyond their immediate, literal value. When you want to compare against character values, use character constants. IOW, change

    if(input[i] == 32 || input[i] == 9 ){
    

    to

    if ( input[i] == ' ' || input[i] == '\t' )
    

    That way you don't have to worry about whether the character encoding is ASCII, UTF-8, EBCDIC, or some other system. ' ' means space everywhere, '\t' means tab everywhere.

    Finally...

    I know part of your motivation for this exercise is to get familiar with malloc and realloc, but I want to caution you about some things:

    realloc is potentially an expensive operation, it may move data to a new location, and it may fail. You really don't want to realloc a buffer a byte at a time. Instead, it's better to realloc in chunks. A typical strategy is to multiply the current buffer size by some factor > 1 (typically doubling):

    char *tmp = realloc( data, current_size * 2 );
    if ( tmp )
    {
      current_size *= 2;
      data = tmp;
    }
    

    You should always check the result of a malloc, calloc, or realloc call to make sure it succeeded before attempting to access that memory.

    Minor stylistic notes:

    Avoid global variables where you can. There's no reason counter should be global, especially since you pass it as an argument to appendArray. Declare it local to main and pass it as an argument (by reference) to getInput:

    int main( void )
    {
      int counter = 0;
      ...
      char *output = getInitials( input, &counter );
      for(int i = 0; i < counter ; i++)
      {
        printf("%c",toupper(output[i]));
      } 
      ...
    }
    
    /**
     * The "string" typedef is an abomination that *will* lead you astray,
     * and I want to have words with whoever created the CS50 header.
     *
     * They're trying to abstract away the concept of a "string" in C, but 
     * they've done it in such a way that the abstraction is "leaky" - 
     * in order to use and access the input object correctly, you *need to know*
     * the representation behind the typedef, which in this case is `char *`.
     *
     * Secondly, not every `char *` object points to the beginning of a 
     * *string*.    
     *
     * Hiding pointer types behind typedefs is almost always bad juju.  
     */
    char *getInitials( const char *input, int *counter ) 
    {
       ...
       (*counter)++;                                   // parens are necessary here
       output = appendArray(output,input[i],*counter); // need leading * here
       ...
    }