Search code examples
perlbioinformaticsqiime

match with 2 or more options perl


I have two formats obtained from qiime analyses, one obtained from silva database and other obtained from GreenGenes. The difference among those files, are that silva files have a progressive D_number for each taxon (kingdom= D_0__, phylum= D_1__, clase= D_2__ and so on) and GreenGenes files have a letter for each taxon (kingdom= K__, phylum= p__, clase= c__ and so on)

file_1 (Silva format)
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__ASC21;D_5__uncultured euryarchaeote



file_2(GreenGenes format)
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces

so I made tow scripts (one for Silva and one for GreenGenes) in Perl to extract each taxon in a separate file.

I'm trying to incorporate a piece of code in the match section for both formats, I mean:

in the line 16, I want two options, something like:

my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g | m/k__(.*);p/g);

Well, I know that it doesn't work

so how can I add two or more option in the same line for match regex ??

this is part of the script (it have 6 option, I just write the Kingdom option !!):

while (<INPUTFILE>){
    $line=$_;
    chomp($line);
    if ($line=~ m/^#/g){
        next;
    }
    elsif ($line=~ m/^[Uu]nassigned/g){
        next;
    }
    elsif ($line){
        my @full_line = $_;
        foreach (@full_line){
            my (@taxon_value)= split (/\t/, $_);
            foreach ($taxon_value[0]){
                if ($kingdom){
                    my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g); # just for silva
                    foreach (@kingd){
                        if ($_=~/^$/){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]nknown/g){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]ncultured$/g){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]nidentified$/g){
                            next;
                        }
                        else {
                            push @taxon_list, $_;
                        }
                    }
                }
           }
      }
 }

thanks


Solution

  • You need to do the or inside of your pattern. You do that with a pipe |, which you already had. But it needs to go into the pattern. No need to have two match operators.

    my @kingd = $taxon_value[0] =~ m/D_0__(.*);D_1|k__(.*);p/g
    

    It will now match either the one, or the other. See perlre and perlretut for more information. You should also read the information provided in the regex tag wiki here on SO as it contains links to many useful tools.

    What you were doing in your code that didn't work is using Perl's | operator, which is a bitwise or.