Search code examples
javaregexescapingslash

Forward slashes in a JAVA regexp


General Situation: I am having some trouble matching a "/" when using String.matches("string"); I assume I am missing some level of escaping.

matches("bar" + "[\\W\\w]*"); //works.
matches("bar/" + "[\\W\\w]*"); //does not work.
matches("bar\/" + "[\\W\\w]*"); //errors
matches("bar\\/" + "[\\W\\w]*"); //does not work
matches("bar\\\\/" + "[\\W\\w]*"); //does not work

What am I missing?

Exact Situation:

private static final String CHECK_TABLE_PRE = "<table class=\"data playerStats\">\n <thead>\n  <tr>\n   <th colspan=\"1\" rowspan=\"1\">&nbsp;</th>\n   <th colspan=\"1\" rowspan=\"1\">G</th>\n   <th colspan=\"1\" rowspan=\"1\">A</th>\n   <th colspan=\"1\" rowspan=\"1\">P</th>\n   <th colspan=\"1\" rowspan=\"1\">+/-</th>\n   <th colspan=\"1\" rowspan=\"1\">PIM</th>\n   <th colspan=\"1\" rowspan=\"1\">PPG</th>\n   <th colspan=\"1\" rowspan=\"1\">SHG</th>\n   <th colspan=\"1\" rowspan=\"1\">S</th>\n   <th colspan=\"1\" rowspan=\"1\">S%</th>\n   <th colspan=\"1\" rowspan=\"1\">Shifts</th>\n   <th colspan=\"1\" rowspan=\"1\">TOI</th>\n   <th colspan=\"1\" rowspan=\"1\">FO%</th>\n  </tr>\n</thead>\n<tbody>\n  <tr>";  //want to use this, but the "/" makes it fail
private static final String CHECK_TABLE_POST = "  </tr>\n </tbody>\n</table>"; //Both gotten by copy/pasting from the console.
System.out.println(table.outerHtml().matches("<table class=\"data playerStats\">\n <thead>\n  <tr>\n   <th colspan=\"1\" rowspan=\"1\">&nbsp;</th>\n   <th colspan=\"1\" rowspan=\"1\">G</th>\n   <th colspan=\"1\" rowspan=\"1\">A</th>\n   <th colspan=\"1\" rowspan=\"1\">+" + "[\\W\\w]*" + CHECK_TABLE_POST));
//This works, but I cannot add a add the "/" without getting it to fail.

//Where table = Jsoup.connect("http://www.nhl.com/ice/player.htm?view=log&id=8470598").get().select("table.data.playerStats").get(0);

OK, Here is an even Smaller more Self Contained example:

String test = "<table class=\"data playerStats\">\n <thead>\n  <tr>\n   <th colspan=\"1\" rowspan=\"1\">&nbsp;</th>\n   <th colspan=\"1\" rowspan=\"1\">G</th>\n   <th colspan=\"1\" rowspan=\"1\">A</th>\n   <th colspan=\"1\" rowspan=\"1\">P</th>\n   <th colspan=\"1\" rowspan=\"1\">+/-</th>\n   <th colspan=\"1\" rowspan=\"1\">PIM</th>\n   <th colspan=\"1\" rowspan=\"1\">PPG</th>\n   <th colspan=\"1\" rowspan=\"1\">SHG</th>\n   <th colspan=\"1\" rowspan=\"1\">S</th>\n   <th colspan=\"1\" rowspan=\"1\">S%</th>\n   <th colspan=\"1\" rowspan=\"1\">Shifts</th>\n   <th colspan=\"1\" rowspan=\"1\">TOI</th>\n   <th colspan=\"1\" rowspan=\"1\">FO%</th>\n  </tr>\n </thead>\n <tbody>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020168\"> Feb 10 '13 </a> BOS @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>0.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">22</td>\n   <td colspan=\"1\" rowspan=\"1\">15:23</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020161\"> Feb 9 '13 </a> BUF @ NYI</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>33.3</span></td>\n   <td colspan=\"1\" rowspan=\"1\">26</td>\n   <td colspan=\"1\" rowspan=\"1\">21:47</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>100.00</span></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020141\"> Feb 7 '13 </a> MTL @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">8</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>25.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">25</td>\n   <td colspan=\"1\" rowspan=\"1\">23:56</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>100.00</span></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020132\"> Feb 5 '13 </a> BUF @ OTT</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>0.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">23</td>\n   <td colspan=\"1\" rowspan=\"1\">20:15</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020121\"> Feb 3 '13 </a> FLA @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>100.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">24</td>\n   <td colspan=\"1\" rowspan=\"1\">19:15</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020109\"> Feb 2 '13 </a> BUF @ MTL</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">4</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>25.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">19</td>\n   <td colspan=\"1\" rowspan=\"1\">17:20</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020092\"> Jan 31 '13 </a> BUF @ BOS</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">5</td>\n   <td colspan=\"1\" rowspan=\"1\">4</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">4</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>75.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">21</td>\n   <td colspan=\"1\" rowspan=\"1\">19:21</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020079\"> Jan 29 '13 </a> TOR @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">4</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>0.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">27</td>\n   <td colspan=\"1\" rowspan=\"1\">23:01</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020051\"> Jan 25 '13 </a> CAR @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">6</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">6</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>16.7</span></td>\n   <td colspan=\"1\" rowspan=\"1\">22</td>\n   <td colspan=\"1\" rowspan=\"1\">17:55</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020043\"> Jan 24 '13 </a> BUF @ CAR</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">-1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>0.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">21</td>\n   <td colspan=\"1\" rowspan=\"1\">18:45</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020023\"> Jan 21 '13 </a> BUF @ TOR</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>0.0</span></td>\n   <td colspan=\"1\" rowspan=\"1\">23</td>\n   <td colspan=\"1\" rowspan=\"1\">18:52</td>\n   <td colspan=\"1\" rowspan=\"1\"></td>\n  </tr>\n  <tr>\n   <td colspan=\"1\" rowspan=\"1\"><a class=\"undMe\" href=\"/ice/recap.htm?id=2012020014\"> Jan 20 '13 </a> PHI @ BUF</td>\n   <td colspan=\"1\" rowspan=\"1\">2</td>\n   <td colspan=\"1\" rowspan=\"1\">3</td>\n   <td colspan=\"1\" rowspan=\"1\">5</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">1</td>\n   <td colspan=\"1\" rowspan=\"1\">0</td>\n   <td colspan=\"1\" rowspan=\"1\">9</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>22.2</span></td>\n   <td colspan=\"1\" rowspan=\"1\">26</td>\n   <td colspan=\"1\" rowspan=\"1\">19:17</td>\n   <td colspan=\"1\" rowspan=\"1\"><span>100.00</span></td>\n  </tr>\n </tbody>\n</table>";
System.out.println(test.matches("<table class=\"data playerStats\">\n <thead>\n  <tr>\n   <th colspan=\"1\" rowspan=\"1\">&nbsp;</th>\n   [\\w\\W]*"));
System.out.println(test.matches("<table class=\"data playerStats\">\n <thead>\n  <tr>\n   <th colspan=\"1\" rowspan=\"1\">&nbsp;</th>\n   <th colspan=\"1\" rowspan=\"1\">G</th>\n   <th colspan=\"1\" rowspan=\"1\">A</th>\n   <th colspan=\"1\" rowspan=\"1\">P</th>\n   <th colspan=\"1\" rowspan=\"1\">+/-[\\w\\W]*"));
System.out.println(test.matches(test));

//Console return "true \n false \n false" (without the spaces).

OK, bear with me here. Shorter example:

String test = "foo+/-bar";
System.out.println(test.matches("foo+[\\w\\W]*"));
System.out.println(test.matches("foo+/[\\w\\W]*"));
System.out.println(test.matches("foo+/-bar[\\w\\W]*"));
System.out.println(test.matches(test));
//true false false false
//But if I leave out the +-, so that String test = "foo/bar"; (and change the rest of the example accordingly) the whole example work (returns true).

SO, there is something weird with "+/". Maybe I have to escape the plus.


Solution

  • SO, there is something weird with "+/". Maybe I have to escape the plus.

    Yes, you do. + in a regular expression means one or more of the preceding thing, so to match a literal plus sign you need \+ in the regular expression, which means \\+ in the Java string literal

    test.matches("foo\\+/-bar[\\w\\W]*"