Search code examples
apache

Why the files directive doesn't work in Apache's httpd.conf?


I had to noindex pdf files. I did it many times, so in this case, I used a files directive for adding noindex header with X-Robots-Tag, like Google recommends:

<Files ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

When I have used this before, it worked like a charm. But in this case, I realized no X-Robots-Tag on its own, neither its content (noindex, nofollow) in header. Mod_headers was enabled.

I tried

<FilesMatch ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

with no luck.

After many further tries and errors I've got it working with

<LocationMatch ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</LocationMatch>

But I don't really understand why the rule I used for years stopped working and the rule I blindly tried, suddenly works.

Could somebody explain it to me?


Solution

  • The documentation for Apache states that FilesMatch takes a regular expression pattern <FilesMatch regexp> and is preferred over using <Files ~ "regexp">

    The <FilesMatch> directive limits the scope of the enclosed directives by filename, just as the <Files> directive does. However, it accepts a regular expression.

    In my experience with RegEx, this means using a wildcard to match all, rather than the normal <Files> directive which matches on a substring.

    As for matching all named files in an expression, that means a small tweak is required to your existing code:

    <FilesMatch ".+\.pdf$">
      Header set X-Robots-Tag "noindex, nofollow"
    </FilesMatch>
    

    If you expect to have a file named .pdf that you also need to exclude, replace + in that expression with *. This is due to how RegEx matches:

    • . Match any character, once.
    • + The previous modifier or block must occur one or more times
    • * The previous modifier or block may occur zero or more times

    This means .+ matches all files with at least one character before .pdf in the filename, and .* matches all files ending on .pdf.

    As for an explanation on why your Files directive doesn't work: The Files directive may be overridden by other Files directives appearing later in the same configuration or within a .htaccess file in the directory you're keeping the pdf files in. Furthermore, there's an order in which the directives are handled and they can all override previous steps: Directory < Files in Directory < .htaccess < Files in .htaccess < Location. So it's most probably a different part of the configuration that ignores the Files directive