Search code examples
arrayscstringpointersfunction-pointers

How is filter function working for different types of pointers?


Writing a filter function that can handle different types of arrays in C!

I tried to write a filter function that takes an integer array and a callback function that evaluates to true or false. Based on the result of the callback function, an element is added to the filtered array, and the function returns the filtered array. Can we generalize it? Like, can we make it work for any type of array? I am not able to figure out how to make it work for strings (char *'s).

Here's is the code snippet, could anyone please explain how's working? (flow and what's being passed for both integer array and char * arrays). What's memcpy copying in case of char * arrays?

#include <stdio.h>
#include <conio.h>
#include <string.h>
#include <stdlib.h>

void *filter(void *array, int size, int elementSize, int (*fn)(void *), int *filteredSize);
int isEven(void *);
int startsWith(void *);

void main()
{
    int arr[8] = {1,2,3,4,5,6,7,8}, *evens;
    char *strs[5] = {"Boy", "Apple", "Cat", "Zebra", "Aeroplane"};
    char **fStrs;
    int n, i;
    evens = (int *)filter(arr, 8, sizeof(int), isEven, &n);
    puts("Evens");
    for (i = 0; i < n; i++)
    {
        printf("%d\t", evens[i]);
    }
    printf("\n");
    free(evens);

    fStrs = (char **)filter(strs, 5, sizeof(char *), startsWith, &n);
    for (i = 0; i < n; i++)
        printf("%s  ", fStrs[i]);
    printf("\n");
    free(fStrs);
}

void *filter(void *array, int size, int elementSize, int (*fn)(void *), int *filteredSize)
{
    int i;
    void *filteredArray;
    *filteredSize = 0;
    filteredArray = malloc(size * elementSize);
    if (filteredArray == NULL)
        return NULL;
    for (i = 0; i < size; i++)
    {
        void *currentElementLoc = (char *)array + (i * elementSize);
        if ((*fn)(currentElementLoc))
        {
            void *filteredElementLoc = (char *)filteredArray + (*filteredSize * elementSize);
            memcpy(filteredElementLoc, currentElementLoc, elementSize);
            (*filteredSize)++;

        }
    }
    filteredArray = realloc(filteredArray, *filteredSize * elementSize);
    if (filteredArray == NULL)
    {
        free(filteredArray);
        return NULL;
    }

    return filteredArray;
}

int isEven(void *p)
{
    int num = *((int *)p);
    return !(num & 1);
}

int startsWith(void *ptrToStr)
{
    static int hasRun = 0;
    static char ch;
    char *str = *((char **)ptrToStr);
    if (!hasRun)
    {
        puts("Starts with");
        scanf(" %c", &ch);
        hasRun = 1;
    }
    return *str == ch;
}

Solution

  • Do you have any experience using qsort? The callback function for qsort has the signature int (*comp)(const void *, const void *). Almost all of the same principles of qsort apply to your filter function.

    In both cases, the callback function is being provided pointers to elements of the array. These elements may themselves be pointers.

    Remember: pointers are values. There is no fundamental difference between copying the value of one int to another int, and copying the value of one (e.g.) float * to another float *.

    int a = 5;
    int b = a;
    
    float some_data = 1.23f;
    float *a_pointer = &some_data;
    float *b_pointer = a_pointer;
    

    Both the assignments b = a and a_pointer = b_pointer perform a byte-wise copy of the value of the right to the object on the left.

    By extension, memcpy works the same way, but through one level of indirection.

    int a = 5;
    int b;
    
    memcpy(&b, &a, sizeof (int));
    
    float some_data = 1.23f;
    float *a_pointer = &some_data;
    float *b_pointer;
    
    memcpy(&b_pointer, &a_pointer, sizeof (float *));
    

    In both examples, the third argument to memcpy is the size of the object being copied. In the first example, an int, in the second example, a float *.

    memcpy requires you specify where in memory these objects exist, hence it accepts two pointers, but it does not care what the objects are - just how much memory they occupy.

    With

    char *first_array[2] = {
        "Hello",         
        "World" 
    };
    char *second_array[2];   
                             
    memcpy(&second_array[0], &first_array[1], sizeof (char *));
    

    memcpy copies the value (just happens to be a pointer value, a char *) of the second element of the first array to the first element of the second array.


    So back to filter. There is one caveat shared with qsort, and one problem unique to your function.

    Like qsort, you can operate on arrays of arrays. In these instances, care must be taken by the callback function to correctly specify the type passed via void *.

    So if you have an array of arrays like

    char data[][32] = {
        "foo",
        "bar"
    };
    

    then the callback is actually receiving a char (*)[32] - a pointer to an array. This means a callback function cannot be made to work "generically" on all string containers. char ** and char (*)[N] are not equivalent. An "array of strings" is not always an array of char *.

    A unique problem with your filter function is that creating a new array that potentially shares data with the previous array opens up a concern of ownership, and raises a lot of questions:

    • If both arrays contain pointer values returned by malloc (or similar) - which array owner is responsible for passing those pointer values to free?

      • If its the new array owner, how does the the owner of old array know which values not to pass to free?
    • Should the new array be considered a view into the old array?

      • How does the owner of the old array know the new array is finished being used?

    I cannot answer these questions, as it is a point of design. Filtering the array in-place, and providing a facility to handle rejected data addresses some problems, but it is a compromise.

    Here is a cursory example:

    #define _POSIX_C_SOURCE 200809L
    #include <ctype.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    static size_t filter(
        void *base,
        size_t nel,
        size_t width,
        int (*fn)(void *),
        void (*on_reject)(void *))
    {
        size_t len = 0;
    
        for (size_t i = 0; i < nel; i++) {
            void *element = (char *) base + i * width;
    
            if (fn(element))
                memcpy(((char *) base + len++ * width), element, width);
            else if (on_reject)
                on_reject(element);
        }
    
        return len;
    }
    
    static int is_even(void *el)
    {
        int *val = el;
    
        return *val % 2 == 0;
    }
    
    static int is_uppercase(void *el)
    {
        const char *val = *((char (*)[32]) el);
    
        if (!*val)
            return 0;
    
        while (*val)
            if (!isupper((unsigned char) *val++))
                return 0;
    
        return 1;
    }
    
    static int starts_with_ba(void *el)
    {
        return 0 == strncmp(*(char **) el, "ba", 2);
    }
    
    static void reject_ba(void *el)
    {
        free(*(char **) el);
    }
    
    int main(void)
    {
        int nums[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
        size_t nums_l = filter(nums, sizeof nums / sizeof *nums, sizeof *nums,
                is_even, NULL);
    
        for (size_t i = 0; i < nums_l; i++)
            printf("%d\n", nums[i]);
    
        char strs[][32] = {
            "hello",
            "World",
            "UPPER",
            "CASE",
            "no",
            "lower",
            "OKAY"
        };
    
        size_t strs_l = filter(strs, sizeof strs / sizeof *strs, sizeof *strs,
                is_uppercase, NULL);
    
        for (size_t i = 0; i < strs_l; i++)
            printf("%s\n", strs[i]);
    
        char *data[] = {
            strdup("foo"),
            strdup("bar"),
            strdup("baz"),
            strdup("qux")
        };
    
        size_t data_l = filter(data, sizeof data / sizeof *data, sizeof *data,
                starts_with_ba, reject_ba);
    
        for (size_t i = 0; i < data_l; i++) {
            puts(data[i]);
            free(data[i]);
        }
    }
    

    The output of this program is:

    2
    4
    6
    8
    UPPER
    CASE
    OKAY
    bar
    baz