Search code examples
phpregular-language

any character notation for php regular expression


In my regex, I want to say that within the sample text, any characters are allowed, including a-z in upper and lower case, numbers and special characters.

For example, my regular expression may be checking that a document is html. therefore:

"/\n<html>[]+</html>\n/"

i have tried []+ but it does not seem to like this?


Solution

  • Using [XXX]+ means any character that's between [ and ], one or more than one time.

    Here, you didn't put any character between [ and ] -- hence the problem.


    If you want to say "any possible character", you can use a `.`
    Note : by default, it will not match newlines ; you'll have to play with [**Pattern Modifiers**][1] if you want it to.

    If you want to say any letter, you can use :

    • for lower case : [a-z]
    • for upper-case : [A-Z]
    • for both : [a-zA-Z]

    And, for numbers :

    • [0-9] : any digit
    • [a-zA-Z0-9] : any lower-case or upper-case letter, and any number.

    At that point, you will probably want to take a look at :
    • The Backslash section of the PCRE manual
    • And, especially, the \w meta-character, which means "any word character"

    After that, when you'll begin using a regex such as
    /.+/s
    

    which should match :

    • Any possible character
      • Including newlines
    • One or more time

    You'll see that it doesn't "stop" when you expect it too -- that's because matching is greedy, by default -- you'll have to use a ? after the +, or use the U modifier ; see the Repetition section, for more informations.


    Well, actually, the best thing to do would be to *invest* some time, carefully reading everything in the [**PCRE Patterns**][4] section of the manual, if you want to start working with regexes ;-)
    Oh, and, BTW : **using regex to *parse* HTML is a bad idea...**

    It's generally much better to use a DOM Parser, such as :