Search code examples
javascriptphpapache.htaccesserror-handling

Get incoming URL (referrer) when handling error with .htaccess & PHP


I would like to grab the incoming URL when an error is thrown. For example, if user inputs https://example.ca/missing-file123.php and the .htaccess file redirects user to https://example.ca/error.php?error=404 because missing-file123.php does not exist, I would like to know that the user came to error.php?error=404 from missing-file123.php (Or whatever the non-existent URL may be). I've tried using JavaScript document.referrer although it returns an empty value (I think because .htaccess redirects before client actually visits missing page?). I am looking for a solution using JavaScript, PHP or .htaccess to get or push the referrer when an error is being handled. Thanks! (Below is my .htaccess file and error page).

ErrorDocuments:

ErrorDocument 401 https://example.ca/error.php?error=401
ErrorDocument 403 https://example.ca/error.php?error=403
ErrorDocument 404 https://example.ca/error.php?error=404
ErrorDocument 500 https://example.ca/error.php?error=500

Note: I am using full URL's for error documents as they are held in main directory and I want sub directory's to redirect to them as well. Because of this I cannot use $_SERVER["REQUEST_URI"].

error.php:

$error = "";
$errmsg = "";
if (isset($_GET['error'])) {
    if (in_array($_GET['error'], ['401', '403', '404', '500'])) {
        $error = $_GET['error'];
    } else {
        header("location:/");
        exit;
    }
} else {
    header("location:/");
    exit;
}

Here is the rest of my .htaccess file in case there is a conflict with solution:

RewriteEngine on 
RewriteCond %{HTTP_HOST} ^(www\.)?example\.ca$ [NC]
RewriteRule ^/?$ homedir/index.php [L]
RewriteCond %{HTTP_HOST} ^(www\.)?example\.ca$ [NC]
RewriteRule ^(?!homedir/)(.+)$ homedir/$1 [L,NC]

Options All -Indexes
IndexIgnore * 
<Files .htaccess>
Order Allow, Deny
Deny from all
</Files>

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule ^(.*)$ index.php [F,L]

ErrorDocument 401 https://example.ca/error.php?error=401
ErrorDocument 403 https://example.ca/error.php?error=403
ErrorDocument 404 https://example.ca/error.php?error=404
ErrorDocument 500 https://example.ca/error.php?error=500

DirectoryIndex index.php index.html 

SetEnv TZ America/New_York

Solution

  • ErrorDocument 401 https://example.ca/error.php?error=401
    ErrorDocument 403 https://example.ca/error.php?error=403
    ErrorDocument 404 https://example.ca/error.php?error=404
    ErrorDocument 500 https://example.ca/error.php?error=500
    

    You've set up your ErrorDocument directives "incorrectly". By specifying an absolute URL (with scheme and hostname) Apache triggers an external 302 redirect to the error document (exposing the error document to the user) and all information about the URL that triggered the error state is lost. The client sees a 302, not the intended error response code. (In fact, the 401 would fail completely since the browser needs the 401 response to determine whether to display the password dialog.)

    You should be using root-relative URLs instead. For example:

    ErrorDocument 401 /error.php?error=401
    ErrorDocument 403 /error.php?error=403
    ErrorDocument 404 /error.php?error=404
    ErrorDocument 500 /error.php?error=500
    

    This now triggers an internal subrequest for the error document (the URL that triggered the error response remains the browser's address bar). In fact, there is no need to pass the error code here, since this is available in the PHP superglobal $_SERVER['REDIRECT_STATUS'].

    PHP then provides a number of variables relating to the URL that triggered the error response. For example, use $_SERVER['REDIRECT_URL'] (or $_SERVER['REQUEST_URI']) to get the URL that triggered the error/response state. REQUEST_URI also includes the query string (if any), but REDIRECT_URL contains the URL-path only.

    The HTTP Referer is not the value you need here. In both cases (internal subrequest and external redirect) the referrer is the URL (if any) from which the URL that triggered the error state was requested. In other words, the URL before the URL that triggered the error state.

    Note: I am using full URL's for error documents as they are held in main directory and I want sub directory's to redirect to them as well. Because of this I cannot use $_SERVER["REQUEST_URI"].

    I'm not sure what "problem" you are trying to resolve here, but using a root-relative URL (starting with a slash) to the error document does not cause issue here and allows any URL (subdirectory, sub-subdirectory etc.) to trigger the error document. (You cannot use a relative URL-path in the ErrorDocument directive.)

    NB: Since the URL path-depth of the URL that triggers the error state can vary, you must use root-relative URL-paths to your static assets (images, CSS, JS, etc.) in the error document itself, otherwise they will not be found for some requests.

    Reference:


    Aside:

    RewriteEngine On
    RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
    RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
    RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
    RewriteRule ^(.*)$ index.php [F,L]
    

    The RewriteRule directive here should simply read:

    RewriteRule ^ - [F]
    

    You don't need to match everything, you just need to be successful for all requests. The substitution string (ie. index.php in your original rule) is simply dropped when using the F flag - you should simply use a single hyphen instead to indicate no-substitution. And when using the F flag, the L flag is implied so can be omitted.

    And there's no need to repeat the RewriteEngine On directive.