Context: I have a build process whose first step generates ~20k files with unpredictable, but deterministic filenames. The remaining steps operate per file, and generate new files with the same name but different extensions.
A simplified shell version of the build process could look like:
./generate_testcases spec.json # Generates many .txt files
find -type f -name '*.txt' -exec ./run_test_case {} \; # Generates a .log file for each .txt file
find -type f -name '*.log' -exec ./grab_test_data {} \; # Generates a .csv file for each .log file
I'd like to do this in make
for several reasons, but particularly because the second two steps are prone to failures, and it would be nice to just run make
another time and have it only run the steps on the files that actually need it.
The steps themselves are fairly trivial with pattern rules:
%.txt : spec.json
./generate_testcases spec.json
%.log : %.txt
./run_test_case $<
%.csv : %.log
./grab_test_data $<
The problem is that I have no way of defining the build target (e.g. an all
) until the first step has run. I do not have control over the first step, and cannot get the list of filenames in advance.
I first tried Secondary Expansion with a $$(shell find ...)
for the prerequisites, but even that is run too early, since make has to do the second expansion to build the dependency tree before running any recipes.
I then thought why not use some kind of marker file_list
to get the first step to run if there are no files present at all, and grab the list of .txt
files from that file for future runs:
text_files := $(shell cat file_list)
.PHONY: all
all : file_list $(text_files)
file_list : spec.json
./generate_testcases spec.json
find -type f -name '*.txt' > file_list
%.txt : spec.json
./generate_testcases spec.json
find -type f -name '*.txt' > file_list
There are some problems here, including the fact that the file_list
can potentially include old files that are no longer generated by a new spec.json
, but I'm willing to live with that, for now. This idea also doesn't help with the second two steps; it doesn't give me a way to add the .log
and .csv
files to the all
target.
My next thought was, "well, I can just run make twice", and I can do that programmatically in the Makefile
itself:
text_files := $(shell cat file_list 2> /dev/null || true)
log_files := $(patsubst %.txt,%.log,$(text_files))
csv_files := $(patsubst %.txt,%.csv,$(text_files))
.PHONY: all
all : file_list $(text_files) $(log_files) $(csv_files)
file_list : spec.json
./generate_testcases spec.json
find -type f -name '*.txt' > file_list
$(MAKE)
%.txt : spec.json
./generate_testcases spec.json
find -type f -name '*.txt' > file_list
$(MAKE)
%.log : %.txt
./run_test_case $<
%.csv : %.log
./grab_test_data $<
When nothing exists, file_list
will still be created, and make
will be called again. The child make
will have a file_list
so text_files
and the other dependent variables will be populated, and the rest of the build will run as desired. If file_list
is up to date, but other pieces are missing, then the first make
will take care of the other steps, without running the child make
.
The problem is when both file_list
is outdated, and there are missing pieces with respect to the outdated file_list
: file_list
will get updated, the child make
will get called on that and run everything, but then the parent make
will still run the outdated build steps it thinks it needs due to the old file_list
.
I can probably live with these issues, but I'd like to resolve them if possible easily. Is there a better way of doing this?
Your idea of invoking a sub-make is definitely the simplest and most clear way to handle this.
To avoid things getting built multiple times, you just need to not have the top-makefile depend on any of the final outputs, and always invoke the sub-make. It would be something like this:
text_files := $(shell cat file_list 2> /dev/null)
log_files := $(patsubst %.txt,%.log,$(text_files))
csv_files := $(patsubst %.txt,%.csv,$(text_files))
.PHONY: all
all : file_list
$(MAKE) _build_inner
file_list : spec.json
./generate_testcases $<
find -type f -name '*.txt' > $@
.PHONY: _build_inner
_build_inner: $(log_files) $(csv_files)
...