How can I build custom rules using the output of workspace_status_command?

The bazel build flag --workspace_status_command supports calling a script to retrieve e.g. repository metadata, this is also known as build stamping and available in rules like java_binary.

I'd like to create a custom rule using this metadata. I want to use this for a common support function. It should receive the git version and some other attributes and create a version.go output file usable as a dependency.

So I started a journey looking at rules in various bazel repositories.

Rules like rules_docker support stamping with stamp in container_image and let you reference the status output in attributes.

rules_go supports it in the x_defs attribute of go_binary.

This would be ideal for my purpose and I dug in...

It looks like I can get what I want with ctx.actions.expand_template using the entries in ctx.info_file or ctx.version_file as a dictionary for substitutions. But I didn't figure out how to get a dictionary of those files. And those two files seem to be "unofficial", they are not part of the ctx documentation.

Building on what I found out already: How do I get a dict based on the status command output?

If that's not possible, what is the shortest/simplest way to access workspace_status_command output from custom rules?

Solution

I've been exactly where you are and I ended up following the path you've started exploring. I generate a JSON description that also includes information collected from git to package with the result and I ended up doing something like this:

def _build_mft_impl(ctx):
    args = ctx.actions.args()
    args.add('-f')
    args.add(ctx.info_file)
    args.add('-i')
    args.add(ctx.files.src)
    args.add('-o')
    args.add(ctx.outputs.out)
    ctx.actions.run(
        outputs = [ctx.outputs.out],
        inputs = ctx.files.src + [ctx.info_file],
        arguments = [args],
        progress_message = "Generating manifest: " + ctx.label.name,
        executable = ctx.executable._expand_template,
    )

def _get_mft_outputs(src):
    return {"out": src.name[:-len(".tmpl")]}

build_manifest = rule(
        implementation = _build_mft_impl,
        attrs = {
            "src": attr.label(mandatory=True,
                              allow_single_file=[".json.tmpl", ".json_tmpl"]),
            "_expand_template": attr.label(default=Label("//:expand_template"),
                                           executable=True,
                                           cfg="host"),
        },
        outputs = _get_mft_outputs,
    )

//:expand_template is a label in my case pointing to a py_binary performing the transformation itself. I'd be happy to learn about a better (more native, fewer hops) way of doing this, but (for now) I went with: it works. Few comments on the approach and your concerns:

AFAIK you cannot read in (the file and perform operations in Skylark) itself...
...speaking of which, it's probably not a bad thing to keep the transformation (tool) and build description (bazel) separate anyways.
It could be debated what constitutes the official documentation, but ctx.info_file may not appear in the reference manual, it is documented in the source tree. :) Which is case for other areas as well (and I hope that is not because those interfaces are considered not committed too yet).

For sake of comleteness in src/main/java/com/google/devtools/build/lib/skylarkbuildapi/SkylarkRuleContextApi.java there is:

@SkylarkCallable(
  name = "info_file",
  structField = true,
  documented = false,
  doc =
  "Returns the file that is used to hold the non-volatile workspace status for the "
      + "current build request."
)
public FileApi getStableWorkspaceStatus() throws InterruptedException, EvalException;

EDIT: few extra details as asked in the comment.

In my workspace_status.sh I would have for instance the following line:

echo STABLE_GIT_REF $(git log -1 --pretty=format:%H)

In my .json.tmpl file I would then have:

"ref": "${STABLE_GIT_REF}",

I've opted for shell like notation of text to be replaced, since it's intuitive for many users as well as easy to match.

As for the replacement, relevant (CLI kept out of this) portion of the actual code would be:

def get_map(val_file):
    """
    Return dictionary of key/value pairs from ``val_file`.
    """
    value_map = {}

    for line in val_file:
        (key, value) = line.split(' ', 1)
        value_map.update(((key, value.rstrip('\n')),))
    return value_map


def expand_template(val_file, in_file, out_file):
    """
    Read each line from ``in_file`` and write it to ``out_file`` replacing all
    ${KEY} references with values from ``val_file``.
    """
    def _substitue_variable(mobj):
        return value_map[mobj.group('var')]
    re_pat = re.compile(r'\${(?P<var>[^} ]+)}')
    value_map = get_map(val_file)
    for line in in_file:
        out_file.write(re_pat.subn(_substitue_variable, line)[0])

EDIT2: This is how the Python script is how I expose the python script to rest of bazel.

py_binary(
    name = "expand_template",
    main = "expand_template.py",
    srcs = ["expand_template.py"],
    visibility = ["//visibility:public"],
)