Search code examples
makefilegnu-make

How do I write a Makefile to run a build process where the target filenames aren't known in advance, but after the firststep?


Context: I have a build process whose first step generates ~20k files with unpredictable, but deterministic filenames. The remaining steps operate per file, and generate new files with the same name but different extensions.

A simplified shell version of the build process could look like:

./generate_testcases spec.json # Generates many .txt files
find -type f -name '*.txt' -exec ./run_test_case {} \; # Generates a .log file for each .txt file
find -type f -name '*.log' -exec ./grab_test_data {} \; # Generates a .csv file for each .log file

I'd like to do this in make for several reasons, but particularly because the second two steps are prone to failures, and it would be nice to just run make another time and have it only run the steps on the files that actually need it.

The steps themselves are fairly trivial with pattern rules:

%.txt : spec.json
    ./generate_testcases spec.json
%.log : %.txt
    ./run_test_case $<
%.csv : %.log
    ./grab_test_data $<

The problem is that I have no way of defining the build target (e.g. an all) until the first step has run. I do not have control over the first step, and cannot get the list of filenames in advance.


I first tried Secondary Expansion with a $$(shell find ...) for the prerequisites, but even that is run too early, since make has to do the second expansion to build the dependency tree before running any recipes.

I then thought why not use some kind of marker file_list to get the first step to run if there are no files present at all, and grab the list of .txt files from that file for future runs:

text_files := $(shell cat file_list)

.PHONY: all
all : file_list $(text_files)

file_list : spec.json
   ./generate_testcases spec.json
   find -type f -name '*.txt' > file_list

%.txt : spec.json
   ./generate_testcases spec.json
   find -type f -name '*.txt' > file_list

There are some problems here, including the fact that the file_list can potentially include old files that are no longer generated by a new spec.json, but I'm willing to live with that, for now. This idea also doesn't help with the second two steps; it doesn't give me a way to add the .log and .csv files to the all target.


My next thought was, "well, I can just run make twice", and I can do that programmatically in the Makefile itself:

text_files := $(shell cat file_list 2> /dev/null || true)
log_files := $(patsubst %.txt,%.log,$(text_files))
csv_files := $(patsubst %.txt,%.csv,$(text_files))

.PHONY: all
all : file_list $(text_files) $(log_files) $(csv_files)

file_list : spec.json
   ./generate_testcases spec.json
   find -type f -name '*.txt' > file_list
   $(MAKE)

%.txt : spec.json
   ./generate_testcases spec.json
   find -type f -name '*.txt' > file_list
   $(MAKE)
%.log : %.txt
    ./run_test_case $<
%.csv : %.log
    ./grab_test_data $<

When nothing exists, file_list will still be created, and make will be called again. The child make will have a file_list so text_files and the other dependent variables will be populated, and the rest of the build will run as desired. If file_list is up to date, but other pieces are missing, then the first make will take care of the other steps, without running the child make.

The problem is when both file_list is outdated, and there are missing pieces with respect to the outdated file_list: file_list will get updated, the child make will get called on that and run everything, but then the parent make will still run the outdated build steps it thinks it needs due to the old file_list.

I can probably live with these issues, but I'd like to resolve them if possible easily. Is there a better way of doing this?


Solution

  • Your idea of invoking a sub-make is definitely the simplest and most clear way to handle this.

    To avoid things getting built multiple times, you just need to not have the top-makefile depend on any of the final outputs, and always invoke the sub-make. It would be something like this:

    text_files := $(shell cat file_list 2> /dev/null)
    log_files := $(patsubst %.txt,%.log,$(text_files))
    csv_files := $(patsubst %.txt,%.csv,$(text_files))
    
    .PHONY: all
    all : file_list
            $(MAKE) _build_inner
    
    file_list : spec.json
            ./generate_testcases $<
            find -type f -name '*.txt' > $@
    
    .PHONY: _build_inner
    _build_inner: $(log_files) $(csv_files)
    
      ...