I'm trying to identify list data the code is:
my $listdata = '
List Items:
(1)LIST 1 data
(a)sub data
(b)sub data
(c)sub data
(d)sub data
(i)sub-sub data
(ii)sub-sub data
(A)sub-sub-sub data
(B)sub-sub-sub data
(iii)sub-sub data
(e)sub data
(2)LIST 2 data
(3)LIST 3 data
';
#print "\n\n\n$listdata\n\n";
###Array of multi-level patterns
my @level_check =('\(\d+\)','(?<!\()\d+\)','\([a-h]\)','(?<!\()[a-h]\)','\([A-H]\)','(?<!\()[A-H]\)','\d+\.',
'\([IVX]+\)','(?<!\()[IVX]+\)','\([ivx]+\)','(?<!\()[ivx]+\)','\-');
###pattern for each levels
my ($first_level,$second_level,$third_level,$fourth_level);
###First from each pattern
my ($first_occur,$second_occur,$third_occur,$fourth_occur);
#++++++++++++++++++++++++Pattern for multilevel list+++++++++++++++++++++++#
my $pattern = '((?:[IVX\-\(\)\d\.\-][a-z]?\)?)+)';
$listdata =~ s{$pattern}{
my ($leveltemp) = ($1);
$first_occur = $leveltemp if !$first_occur;
#print "$data";
#print "all_level: $leveltemp##\n";
#########First Level Start
for($i=0; $i<scalar(@level_check);$i++){
if($first_occur =~ /^$level_check[$i]$/){
$first_level = $level_check[$i] if !$first_level;
#print "$level_check[$i] =>is Ist: $first_level\n";
}
}
for($i=0; $i<scalar(@level_check);$i++){
if($leveltemp =~ /^$first_level$/){
$leveltemp =~ s{$pattern}{<<LIST1>>$2$3};
#print"**$data level matched: $leveltemp => $first_level\n";
############First Level End
}
else
{
######Second level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([a-h]{3,})/i){
$second_occur = $leveltemp if !$second_occur;
#print "$leveltemp :$second_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($second_occur =~ /^$level_check[$i]$/){
$second_level = $level_check[$i] if !$second_level;
#print "$leveltemp =>is IInd: $second_level\n";
}
}
if($leveltemp =~ /^$second_level/){
$leveltemp =~ s{$pattern}{<<LIST2>>$2$3};
#print"**level matched: $leveltemp => $seconf_level\n";
######Second level End
}
else
{
########Third Level Start
if($leveltemp !~ /^(?:<<LIST\d+>>|\d{3,}\,?|\([A-h]{3,})/i){
$third_occur = $leveltemp if !$third_occur;
for($i=0; $i<scalar(@level_check);$i++){
if($third_occur =~ /^$level_check[$i]$/){
$third_level = $level_check[$i] if !$third_level;
#print "$leveltemp =>is IIIrd: $third_level\n";
}
}
if($leveltemp =~ /^$third_level/){
$leveltemp =~ s{$pattern}{<<LIST3>>$2$3};
#print"**level matched: $leveltemp => $third_level\n";
#########Third Level End
}
else
{
########Fourth Level Start
if($leveltemp !~ /^(?:<<LIST+>>|\d{3,}\,?|\([A-z]{3,})/i){
$fourth_occur = $leveltemp if !$fourth_occur;
#print "$leveltemp :$fourth_occur\n";
for($i=0; $i<scalar(@level_check);$i++){
if($fourth_occur =~ /^$level_check[$i]$/){
$fourth_level = $level_check[$i] if !$fourth_level;
#print "$leveltemp =>is IVrth: $fourth_level\n";
}
}
if($leveltemp =~ /^$fourth_level/){
$leveltemp =~ s{$pattern}{<<LIST4>>$2$3};
#print"**$fourth_occur level matched: $leveltemp => $fourth_level\n";
#########Fourth Level End
}
#######Add Next Levels Here If Any in else loop
}
}#IV lvl else loop end
}
}#III lvl else loop end
}
}#IInd lvl else loop end
}#Ist lvl for loop end
"$leveltemp"
}gsixe;
print "$listdata\n";
The Output Required:
<<LIST1>>(1)LIST 1 data
<<LIST2>>(a)sub data
<<LIST2>>(b)sub data
<<LIST2>>(c)sub data
<<LIST2>>(d)sub data
<<LIST3>>(i)sub-sub data
<<LIST3>>(ii)sub-sub data
<<LIST4>>(A)sub-sub-sub data
<<LIST4>>(B)sub-sub-sub data
<<LIST3>>(iii)sub-sub data
<<LIST2>>(e)sub data
<<LIST1>>(2)LIST 2 data
<<LIST1>>(3)LIST 3 data
Problem is that I have to enter code for each level. I coded upto four levels here. But this is not the solution(List may have any number of sub levels). Is there any other way to write short code for this which covers all possible sub levels of the list. Again the list is Dynamic. List can be start in any of the follwing format= A) (A) 1. 1) (1) a) (a) i) (i).
Use a stack to keep track of "open" styles in order to determine if a new style is a child or a parent.
use strict;
use warnings;
my @styles = (
'\(\d+\)', '\d+\)', '\d+\.',
'\([a-h]\)', '[a-h]\)', '\([A-H]\)', '[A-H]\)',
'\([IVX]+\)', '[IVX]+\)', '\([ivx]+\)', '[ivx]+\)',
'-',
);
my @stack;
while (<>) {
for my $i (reverse 0..$#stack) {
if (/$stack[$i]/) {
splice(@stack, $i+1);
goto DONE_LINE;
}
}
for my $style (@styles) {
if (my ($spaces) = /^( *)$style/) {
push @stack, qr/^$spaces$style/;
goto DONE_LINE;
}
}
die "Unrecognized format at line $. - $_";
DONE_LINE:
s/^ *//;
printf("<<LIST%d>>%s", 0+@stack, $_);
}
To avoid recompiling the same regex patterns over and over again, add
my %re_cache = map { $_ => qr/^( *)$_/ } @styles;
and change
/^( *)$style/
to
/$re_cache{$style}/