Search code examples
cygwingnu-make

Is there any way to tell make to print better diagnostic information when a failure occurs?


GNU make is generating errors whose root cause is eluding me.

Some brief background: The project is an embedded firmware. We are cross-compiling on a windows host. The build actually runs in the cygwin environment. I have been building the project for some time without any problems. This is a key point. The project was building OK and I have made no changes to the project source or the Makefiles.

A few days ago I installed python3 on my cygwin environment. It had been a while since I ran the Cygwin setup and the setup recommended I update a dozen or so other packages. I accepted the update suggestions without much thought. I did not capture the list of packages or the old versions. I mean, what could go wrong?

Here's what went wrong: the next time I built this project, make failed. The error is cryptic and the root cause is eluding me. I downgraded GNU make from 4.3-1 (the new version) back to 4.2.1-2, but the project still does not build properly.

Here is the error:

$ make -j8
compile message for file1.c
compile message for file2.c
...
compile message for fileN.c
make: *** [../../build/rules.mak:407: ../obj/fileN.o] Error 127

Error 127 is not listed here in the docs: https://www.gnu.org/software/make/manual/html_node/Error-Messages.html

Searching for Error 127, I found this issue on SO: Make Error 127 when running trying to compile code. On the surface the issues look similar and the answer helped me dig into the problem. The root cause identified in that answer was the execution of a command that mismatches the host architecture. But (1) I do not think the root cause is the same -- see details below, and (2) if the root cause is the same, none of the suggested solutions are viable options for this project.

Here is why I say the root cause is elusive:

  1. If I run make -j8 again, make will build fileN.c without issue, but fail later on file X. I can repeat this process several times until everything builds, but that is not a valid answer, neither for a human or for an automated build.

  2. If I run make clean and then start over again, make will fail with different files, e.g. file N will build OK but make will fail on file M or P.

  3. If I run make -j1 with only one job, the build usually succeeds without errors, but not always.

  4. The line number called out by make -- initially -- was a make macro with several commands, so it wasn't clear which command might be the problem. So I added some diagnostics to the recipe: which dos2unix, file -L dos2unix, dos2unix --version, those sorts of things. Now, with those diagnostics added, when make fails, it sometimes generates the 127 error on a line with a simple command like echo "dos2unix version:". How can echo cause an error 127? And why not the previous two-dozen times this exact same echo command was executed?

I want to draw attention to this last point, because make is not providing any useful diagnostic information here at all.

It's pretty clear that the error is not the execution of a command that mismatches the host architecture. Make executes the same commands over and over without errors.

It's pretty clear the issue is not about any particular source file, because re-running make produces a different success/fail profile.

Here is one example of a recipe that fails. Keep in mind that the same recipe succeeds many dozens of times before failing. And if I make clean/make, the failure occurs when processing difference source files.

357 define GENERATE_DEPENDENCY
358   @dos2unix -q $(@:.o=.d)
359   @sed -e '/[a-zA-Z]:/ {' \
360       -e 's/\([a-zA-Z]\):[\\\/]/\/cygdrive\/\L\1\//g' \
361       -e 's/\(.\)\\\(.\)/\1\/\2/g' \
362       -e '}' < $(@:.o=.d)  > $(@:.o=.dep)
363   @cp $(@:.o=.dep) $(@:.o=.dp2)
364   @sed -e 's/#.*//' -e 's/^[^:]*: *//' -e 's/ *\\$$//' \
365       -e '/^$$/ d' -e 's/$$/ :/' < $(@:.o=.dp2) >> $(@:.o=.d)
366   @rm $(@:.o=.dp2)
367
368 endef
...
404 $(COBJS) : $(OBJOUTPUTDIR)/%.o : %.c $(EVERYTHING_DEPENDS_ON) | $(OBJOUTPUTDIR)
405     $(COMPILEHOOK)
406     $(CC) $(LISTINGFLAG)=$(@:.o=.lst)  -c $(CPPINC) $(CPPFLAGS)  -MD $(call fixpath,$<) -o $@
407     $(GENERATE_DEPENDENCY)

It doesn't seem like the problem is in the project at all. I have gone so far as to delete the project and pull a fresh copy of the project that built correctly before my cygwin upgrade. But that did not solve the problem.

I wish I could go back to the earlier install of cygwin, but I have no backup. I don't even know which packages were upgraded.

Is there any command-line option that will tell make to print better diagnostic information for a failure, I would appreciate the information. I tried the "all debug info" option:

$ make -j8 -d > make-output 2>&1

Make generated 4.8MB (86K lines) of output and the build succeeded. But this is about as slow as building with one job, which is highly undesireable. Using make -d would also complicate locating real build issues when they happen.

Alternately, if the root cause somehow really is an architecture mismatch, is there some way to get make to identify the command it was trying to execute when the error occurred? As it stands now, I do not know which command might be causing the problem, or if indeed the problem really is an architecture mismatch.


Solution

  • The problem looks clearly that make is not able to find a command which has been specified in the makefile due to which it throws 127 error.
    Somehow it is not able to find it either in the cygwin bin directory or in the PATH specified.
    While updating the cygwin package it could happen that some important utilities are not installed or new versions which have some dependencies are either missed.
    Since the root cause you identified is during paralell processing, better would be to a do a test and passing
    -O option to print correct make invocations running in parallel.This option instructs make to save the output from the commands it invokes and print it all once the commands are completed.

    make -j8 -O
    

    Reference link : https://www.gnu.org/software/make/manual/html_node/Parallel-Output.html

    Would be good to do a small test if you feel echo is really an issue with a simple test code in which I compile files parallely and use ${NUMBER_OF_PROCESSORS} which would take the available processors of the system. You can change it to a specific value if you wish.
    Also I have added .SILENT so I dont require to add @ prefix to all for echo. If you want to debug you can just comment it #.SILENT :

    .SILENT:
    .PHONY:compile objs test
    TARGET = program.exe
    CC=gcc
    INC = ./inc
    SOURCES = file_1.c file_2.c file_3.c file_4.c file_5.c
    OBJ_FILES:= $(SOURCES:.c=.o)
    
    
    objs: $(OBJ_FILES)
    
    %.o: %.c
        $(CC) $(CFLAGS) -c $< -o $@
    
    all: test 
    
    # Enable parallel compilation
    compile:
        make -j ${NUMBER_OF_PROCESSORS} -O objs
    
    link : compile $(TARGET)
    
    $(TARGET): $(OBJ_FILES)
        $(CC) $(CFLAGS) $(OBJ_FILES) -o $@
    
    test: link 
        # check echo command
        echo "Executing test script"
    

    Execute using : make test
    Would be good to test if this works for you or not and if there is something really of an issue in parallel execution or echo command.

    EDIT:
    Would be good to test once with disabling the Antivirus as also suggested by @matzeri.