Search code examples
cfileembedded-linuxglibc

Got different types of error from a single C program in Linux


I am working on an embedded Linux (kernel-5.10.24) and now I am running a C program to do a stress testing about file copying. The code is using stdio to read and write file, as follows,

#include <stdio.h>
#include <stdlib.h> // For exit()
#include <unistd.h>
#include <string.h>

#ifndef BUF_SIZE        /* Allow "cc -D" to override definition */
#define BUF_SIZE 1024
#endif

static unsigned char buf[BUF_SIZE];
static char *infname = "/tmp/src_file.bin";
static char *ofname  = "/tmp/dst_file.bin";

static int create_dest_file(const char *fname)
{
    FILE *fp = fopen(fname, "w");
    if (fp == NULL) {
        printf("Failed to create/truncate %s\n", fname);
        return 1;
    }
    fclose(fp);
    return 0;
}

static int copy_file(const char *src, const char *dest)
{
    FILE *fp, *fp2;
    int rlen = 0, wlen = 0, rc = 0;

    // Open one file for reading
    fp = fopen(src, "r");
    if (fp == NULL)
    {
        printf("Cannot open file %s\n", src);
        return 1;
    }

    fp2 = fopen(dest, "ab");
    if (fp2 == NULL) {
        fclose(fp);
        return 1;
    }

    while (1) {
        rlen = fread(buf, 1, sizeof(buf), fp);
        if (rlen > 0) {
            wlen = fwrite(buf, 1, rlen, fp2);
            if (wlen != rlen) {
                printf("Wrote len: %d, read len: %d\n", wlen, rlen);
                rc = 1;
                break;
            }
        } else {
            break;
        }
    }
    fclose(fp);
    fclose(fp2);
    return rc;
}

int main(int argc, char **argv)
{
    int i = 0, rc = 0;
    int us = 500000;

    if (argc != 4) {
        printf("Usage: %s srcfile dstfile delay_in_us\n", argv[0]);
        return 1;
    }

    infname = argv[1];
    ofname  = argv[2];
    us = atoi(argv[3]);

    printf("Copying %s to %s\n", infname, ofname);
    for (i = 0; i < 1000; i++) {
        create_dest_file(ofname);
        rc = copy_file(infname, ofname);
        printf("XXXXXXXXXXX %d, rc: %d\n", i, rc);
        usleep(us);
    }

    return 0;
}

After compile it and run it with ./filecopy /root/16MB_src.bin /root/dest.bin 250000, I would got several different types of error like Segmentation fault, Bus error, Illegal instruction, and so on.

I installed the GDB to run filecopy, and got one following error.

XXXXXXXXXXX 45, rc: 0
XXXXXXXXXXX 46, rc: 0
Fatal error: glibc detected an invalid stdio handle

Program received signal SIGABRT, Aborted.
0x77cdfd44 in ?? () from /lib/libc.so.6
(gdb) bt
#0  0x77cdfd44 in ?? () from /lib/libc.so.6
#1  0x77c964ac in raise () from /lib/libc.so.6
#2  0x77c97ae4 in abort () from /lib/libc.so.6
warning: GDB can't find the start of the function at 0x77cd0c97.
#3  0x77cd0c98 in ?? () from /lib/libc.so.6
(gdb)

I checked the code and asked other colleagues to review the code, no error found :-(.

From the error types, the code triggered same random failures, but I cannot find the root-cause.

The system has 64MB RAM, and the source file is about 18MB, the libc is GLIBC2.38.

With many tests, it is found if the source file is about 1MB, the program ran well, no error hit.

If the source file about 8MB, and 18MB, the program hit errors.
If the file (18MB) is read from NAND and written to RAM, it ran well. If the file (18MB) is read from RAM and written to NAND, it hit error.

The output of free -k showed

# free -k
              total        used        free      shared  buff/cache   available
Mem:          54580       13228        9404         344       31948       38260
Swap:             0           0           0

# free -k
              total        used        free      shared  buff/cache   available
Mem:          54580       13252       17480         344       23848       38240
Swap:             0           0           0

# free -k
              total        used        free      shared  buff/cache   available
Mem:          54580       13260       11936         344       29384       38232
Swap:             0           0           0

# free -k
              total        used        free      shared  buff/cache   available
Mem:          54580       13252        6656         344       34672       38240
Swap:             0           0           0

# free -k
              total        used        free      shared  buff/cache   available
Mem:          54580       13244       19304         344       22032       38224

The memory is NOT used up.


Solution

  • There doesn't seem to be anything wrong with your program, your problem likely lies elsewhere.

    I would got several different types of error like Segmentation fault, Bus error, Illegal instruction, and so on.

    All of these could be the result of bad memory (reading back values other than what was written previously).

    I suggest running a memory checker program to confirm or rule out this possibility.

    You could also run other memory-intensive known stable applications (such as gcc itself) -- if you see crashes in them as well, "bad memory" is a very likely root cause.