Search code examples
phparraysconstantsfind-replace

Fix unquoted PHP array keys


Or rather "Fix unquoted strings used as PHP array keys" but that was a bit long for a title.

I have inherited a quite large codebase where arrays are written like this:

$array[id] = 0;
$array[value] = "test"

While this code actually works, it's throwing a lot of Use of undefined constant notices, so those lines really need to become:

$array['id'] = 0;
$array['value'] = "test"

We are talking about hundreds of thousands lines of codes spread into couple of thousand files.

There are also cases like this:

$_SESSION[user_information][access_bit][ACCESS_NULL] = 1;

Where user_information and access_bit are meant to be strings and ACCESS_NULL is a defined constant. The good thing is that constants are only defined in uppercase.

And to make things more interesting we also have javascript in the same php files, where code like array[id] = 0; is perfectly fine.

I want to clean up this mess efficiently and wrap all those undefined constants in single quotes, but I'm not sure a simple find/replace (even with a regular expression) will do it. Any thoughts?


Solution

  • It turned out to be easier than I thought.

    function fix_unquoted_array_keys($filename){
        if(!is_file($filename)){
            return "File not found!";
        }
        $content = file_get_contents($filename);
        $content = preg_replace('/^!|\$([a-zA-Z_]+)\[([a-z_]+)\]/','\$$1[\'$2\']',$content);
        $content = preg_replace('/\]\[([a-z_]+)\]/','][\'$1\']',$content);
        file_put_contents($filename,$content);
    
        // Check the file just in case we break something.
        @exec("php -l ".$filename." 2>&1",$syntax);
        if($syntax[0] && strpos($syntax[0],"No syntax errors") === false){
            return $syntax[0];
        }
    
        return "OK";
    }
    

    The first preg_replace replaces all simple arrays like $user[id] to $user['id']. It leaves multidimensional arrays like $user[data][id] to $user['data'][id]. I'm deliberately not matching uppercase keys, because they are most probably defined constants.

    The second preg_replace works on all subsequent keys by simply matching the closing bracket of the previous key.

    This may not be the most elegant solution, but it seems that it did the job. I'm now checking diffs for the last hour and I can't find a single place where this has failed.

    P.S. the PHP tokenizer fails on this task, because it also seems to be converting the undefined constant as a strings and they are tokenized as T_STRING.