I need to extract only the second path segment of a URI i.e. given the following URI:
/first/second/third/fourth/...
the regex should extract the second
string from the URI. An explanation of the solution regex would be greatly appreciated.
I am using POSIX complaint regex library.
EDIT: The solution given by Gumbo works at REtester
But, it doesn't seem to work with the code below:
#include "regex.h"
char *regexp (const char *string, const char *patrn, int *begin, int *end){
int i, w=0, len;
char *word = NULL;
regex_t rgT;
regmatch_t match;
wsregcomp(&rgT,patrn,REG_EXTENDED);
if ((wsregexec(&rgT,string,1,&match,0)) == 0) {
*begin = (int)match.rm_so;
*end = (int)match.rm_eo;
len = *end-*begin;
word = (char*) malloc(len+1);
for (i=*begin; i<*end; i++) {
word[w] = string[i];
w++; }
word[w]=0;
}
wsregfree(&rgT);
return word;
}
int main(){
int begin = 0;
int end = 0;
char *word = regexp("/first/second/third","^/[^/]+/([^/]*)",&begin,&end);
printf("ENV %s\n",word);
}
The above prints /first/second
instead of only second
EDIT2:
Same result with java.util.regex
as well.
If you’re just having an absolute URI path, then this regular expression should do it:
^/[^/]+/([^/]*)
An explanation:
^/
matches the start of the string followed by a literal /
[^/]+/
matches one or more characters except /
, followed by a literal /
([^/]*)
matches zero or more characters except /
.The second path segment is then matched by the first group. I used +
for the first and *
for the second because if the first would also allow a zero length, it wouldn’t be an absolute path any more but a scheme-less URI.