Search code examples
cregexposixtaskmanager

Using POSIX Regex in C


I'm actually trying to make my own server Textual User Interface (in order to manage FTP, SSH connection, Task Manager, etc). My problem here is on the task manager

In order to save my tasks I've decided to write all of them in a file. I want each line (corresponding to a task) looking like :

Year Month Day Week-Day Hour Min Second ; Command

In order to be easier, i used same process as cron where * is equivalent to any moment of the corresponding category

* * * * 00 00 00 ; reboot //allow me to run reboot everyday at midnight

In order to do so, I've decided to use POSIX regex. I want it to format :

YEAR [0-9] {1-9}
MONTH [0-9] {2}
DAY [0-9] {2}
WEEK-DAY [A-Z] [a-z] {3}
HOUR [0-9] {2}
MINUTE [0-9] {2}
SECOND [0-9] {2}

COMMAND can be any printable character

This leads me to an issue. I've been able to create this regex :

char *regexString = "^(\\*|([[:digit:]]){1,9})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:alpha:]]){3})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]];[[:blank:]]([[:print:]])*";

It seems it was working but when I tried to use this found here to understand how I could get each component, this leads me to :

Output :
Match 0, Group 0: [ 0-25]: * * * * 00 00 00 ; reboot
Match 0, Group 1: [ 0- 1]: *

Can you help me to understand ? Thanks (:

PS : This is some examples :

* * * * * * * ; command //Match
0 00 00 Mon 00 00 00 ; command //Match
123456789 00 00 Mon 00 00 00 ; command //Match

01234556789 00 00 Mon 00 00 00 ; command //Don't Match
0 00 00 0 00 00 00 ; command //Don't Match
0 0 0 Mon 0 0 0 ; command //Don't Match

EDIT : Here is the code I use

#include <stdio.h>
#include <string.h>
#include <regex.h>

int main ()
{
    char * source = "* * * * 00 00 00 ; reboot";
    char *regexString = "^(\\*|([[:digit:]]){1,9})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:alpha:]]){3})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]](\\*|([[:digit:]]){2})[[:blank:]];[[:blank:]]([[:print:]])*";
    size_t maxMatches = 3; //I've tried for sevrals values, 2, 3 ... same Output
    size_t maxGroups = 3; //I've tried for sevrals values, 2, 3 ... same Output

    regex_t regexCompiled;
    regmatch_t groupArray[maxGroups];
    unsigned int m;
    char * cursor;

    if (regcomp(&regexCompiled, regexString, REG_EXTENDED))
    {
        printf("Could not compile regular expression.\n");
        return 1;
    };

    m = 0;
    cursor = source;
    for (m = 0; m < maxMatches; m ++)
    {
        if (regexec(&regexCompiled, cursor, maxGroups, groupArray, 0))
            break;  // No more matches

        unsigned int g = 0;
        unsigned int offset = 0;
        for (g = 0; g < maxGroups; g++)
        {
            if (groupArray[g].rm_so == (size_t)-1)
                break;  // No more groups

            if (g == 0)
                offset = groupArray[g].rm_eo;

            char cursorCopy[strlen(cursor) + 1];
            strcpy(cursorCopy, cursor);
            cursorCopy[groupArray[g].rm_eo] = 0;
            printf("Match %u, Group %u: [%2u-%2u]: %s\n",
                   m, g, groupArray[g].rm_so, groupArray[g].rm_eo,
                   cursorCopy + groupArray[g].rm_so);
        }
        cursor += offset;
    }

    regfree(&regexCompiled);

    return 0;
}

Exemples Outputs :

//Case of a match :
Output :
Match 0, Group 0: [ 0-25]: * * * * 00 00 00 ; reboot
Match 0, Group 1: [ 0- 1]: * // YEAR
Match 0, Group 2: [ 2- 3]: * // MONTH
Match 0, Group 3: [ 4- 5]: * // DAY
Match 0, Group 4: [ 6- 7]: * // WEEK-DAY
Match 0, Group 5: [ 8- 10]: 00 //HOUR
Match 0, Group 6: [ 11- 13]: 00 //MINUTE
Match 0, Group 7: [ 14- 16]: 00 // SECOND
Match 0, Group 8: [ 20- 25]: reboot //COMMAND
$> echo $?
0

//Case of a match :
Output :
Match 0, Group 0: [ 0-38]: 123456789 00 00 Mon 00 00 00 ; Command
Match 0, Group 1: [ 0- 9]: 123456789 //YEAR
Match 0, Group 2: [ 10- 12]: 00 //MONTH
Match 0, Group 3: [ 13- 15]: 00 //DAY 
Match 0, Group 4: [ 16- 19]: Mon //WEEK-DAY
Match 0, Group 5: [ 20- 22]: 00 //HOUR
Match 0, Group 6: [ 23- 25]: 00 //MINUTE
Match 0, Group 7: [ 26- 28]: 00 //SECOND
Match 0, Group 8: [ 31- 38]: Command //COMMAND
$> echo $?
0

//Case of Not Match
$> echo $?
0

Solution

  • You should be careful when setting the maxGroups variable. Its value is the sum of all capturing groups in the pattern + 1 (the whole match value, the first item).

    You should get rid of all redundant capturing groups and use

    char *regexString = "^(\\*|[[:digit:]]{1,9})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:alpha:]]{3})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]](\\*|[[:digit:]]{2})[[:blank:]];[[:blank:]]([[:print:]]*)";
    

    The regex (see its demo) now has 8 capturing groups, so set maxGroups value to 9:

     size_t maxGroups = 9; // 8 groups + 1 for whole match
    

    And your code should work, see the online demo.

    It may turn out useful to increase the maxMatches to the value that is close or a little above the number of expected matches.