Search code examples
phpregexconcatenationtokenizetext-parsing

Regex to identify variables which are missing their leading $


I'm trying to match variables in some PHP code which are missing their leading dollar sign as a means to repair the code.

Sample input:

foo = "bar"
$bar = foo
foo()
$foo = bar;
bar = foo() {}
$foo = array();

should match:

foo = "bar" -> match foo not bar
$bar = foo -> match foo not bar
foo() -> no match
$foo = bar; -> match bar not foo
bar = foo() {} -> match bar not foo
$foo = array(); -> no match

It just should match all words [A-Za-z0-9_] that are not quoted and do not begin with a $ or end with a (.

edit:

A little example to explain better what I'm trying to achieve:

<?php
/**
 * little script to explain better what im trying to achieve
 */
echo "\nSay Hi :P\n=========\n\n";

$reply = null;

while ("exit" != $reply) {

  // command
  echo "> ";

  // get input
  $reply = trim( fgets(STDIN) );

  // last char
  $last = substr( $reply, -1 );

  // add semicolon if missing
  if ( $last != ";" && $last != "}" ) {
    $reply .= ";";
  }

  /*
   * awesome regex that should add $ chars to words
   * to make using this more comfortable!
   */

  // output buffer
  ob_start();
  eval( $reply );
  echo $out = ob_get_clean();

  // add break
  if ( strlen( $out ) > 0 ) {
    echo "\n";
  }
}

echo "\n\nBye Bye! :D\n\n";
?>

Solution

  • You will have a really hard time trying to parse a programming language with a regex. When you start getting more complicated expressions, regex will become inadequate.

    Nonetheless, here is a regex that matches all your examples:

    (?<![^\s])\w+(?![^;\s])
    

    You may able to expand that to suit your needs.