Search code examples
phpregexwordpressurl-rewritingwpml

404 status header when rewrite url variable contains %2F for slash escape


I've created a wordpress website which dynamically displays one of around 400 thousand products by passing 2 query vars to a WP page which has my custom php template. I'm using WPML to translate my pages and I've enabled the option to keep url vars _brand and _sku. All works fine with this setup (pages displays & status 200 ok header). Language switcher also keeps my query vars.

URL looks like: domain.com/en/products/?_brand=JCB&_sku=01%2F117903

however since i want the url to display like domain.com/en/products/a-brand/a-sku I've added the following rewrite rules to my child's functions.php

function rewrite_products()
{
    //en
    add_rewrite_rule('^products/([^&]+)/([^&]+)', 'index.php?page_id=93837&_brand=$matches[1]&_sku=$matches[2]', 'top');
    //nl
    add_rewrite_rule('^producten/([^&]+)/([^&]+)', 'index.php?page_id=93871&_brand=$matches[1]&_sku=$matches[2]', 'top');
    //fr
    add_rewrite_rule('^produits/([^&]+)/([^&]+)', 'index.php?page_id=93875&_brand=$matches[1]&_sku=$matches[2]', 'top');
    //de
    add_rewrite_rule('^produkte/([^&]+)/([^&]+)', 'index.php?page_id=93876&_brand=$matches[1]&_sku=$matches[2]', 'top');
    //es
    add_rewrite_rule('^productos/([^&]+)/([^&]+)', 'index.php?page_id=93877&_brand=$matches[1]&_sku=$matches[2]', 'top');
}

add_action('init', 'rewrite_products');

function rewrite_product_tags()
{
    add_rewrite_tag('%_brand%', '([^&]+)');
    add_rewrite_tag('%_sku%', '([^&]+)');
}
add_action('init', 'rewrite_product_tags', 10, 0);

I can now surf to the desired URL and I get the same page with the correct product, however the header now gives a 404 as the status.

WPML's language switcher also discards my query vars and just redirects /products/a-brand/a-sku to, for example, /producten (the translation of /products).

but most importantly, the 404 header status is not allowing me to index my pages which i have dumped in a sitemap (around 2M url's) since google thinks the url's are 404's and doesn't index those. This is in contrast is a huge problem.

I've boiled it down to the following problem:

  • i'm using rawurlencode which changes characters like / in the sku's to %2F
  • this causes the header to be a 404
  • JCB/005549646Z/ this url works perfect (displays fine and good header)
  • JCB/01%2F117903/ displays fine but has a 404 header

setting header in template doesn't work. Is my regex wrong? any help is appreciated!


Solution

  • I'm guessing the error was core WordPress related or somewhere in my parent theme. Anyways I never found a solution, only a workaround to replace %2F with !2F when string encoding

    //Custom encoding en decoding to stop %2F 404 headers
    function url_decode($encoded)
    {
        $encoded = str_replace('!2F', '%2F', $encoded);
        $part_url = urldecode($encoded);
    
        return $part_url;
    }
    
    function url_encode($part_url)
    {
        $encoded = rawurlencode($part_url);
        $encoded = str_replace('%2F', '!2F', $encoded);
        return $encoded;
    }