Search code examples
perl

Dollar symbol should be around entities and values- Perl


I have confusion with my code to remove dollars inside the digits (multi values) and to be inserted the dollar symbol around the values.

Sure I am little bit confused.

For e.g.: 10$x$10$x$10$x$10 should be $10x10x10x10$ #might be 'n' numbered infinite.

My code:

use strict;
use warnings;

my $tmp = do { local $/; $_ = <DATA>; };
my @allines = split /\n/, $tmp;
for(@allines)
{
    my $lines = $_;

    my ($pre,$matches,$posts) = "";

    $lines=~s/(\d+)(\$*)\\times\$(\d+)/$1$2\\times$3\$/g;

    print $lines;
}

Input:

__DATA__
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.


Required Output:

where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.```

Solution

  • If you simply want to blindly transform 10$x$10$x$10$x$10 into $10x10x10x10$ without taking account anything about the surrounding text, then this should be enough.

    $lines=~s/(\d+)\$/\$$1/g;
    

    If your requirements are more complex than that, you need to update the question with the details.

    [UPDATE]

    Just looking again at the input and expected output, I see there is a complication -- some of the input looks like this times$10$ with the expected output times$10. That means we have an optional leading $ that needs to be taken into account.

    To deal with that we can add \$? to the start of the regex to match the optional $, like this

    $lines=~s/\$?(\d+)\$/\$$1/g;
    

    Below is a rewrite of your code that also removes some of the unnecessary splitting

    use strict;
    use warnings;
    
    while (<DATA>)
    {
        s/\$?(\d+)\$/\$$1/g;
    
        print ;
    }
    
    __DATA__
    Sample paragraph testing 10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
    

    Output is

    Sample paragraph testing $10\times$10\times$10 text continues....
    Sample paragraph testing $10\times$10\times$10\times$10 text continues....
    Sample paragraph testing $10\times$10\times$10\times$10\times$10\times$10 text continues....
    

    [UPDATE 2]

    Assuming the actual requirements are

    1. change the first occurrence of, say, 123$ into $123
    2. for last occurrence of $123, change to 123$
    3. for the intermediate digit-dollar sequences, remove the dollars.
    use strict;
    use warnings;
    
    while (<DATA>)
    {
        # replace the first occurrence only
        s/\$?(\d+)\$/\$$1/;
    
        # remove $ from the all but the last digit-dollar
        # uses lookahead to prevent matching the last digit-dollar
        s/times\$?(\d+)\$?(?=\\t)/times$1/g;
    
        # rework the last occurrence of digit-dollar
        s/times\$(\d+)/times$1\$/;
    
        print ;
    }
    
    
    Input:
    
    __DATA__
    Sample paragraph testing 10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
    

    output is

    Sample paragraph testing $10\times10\times10$ text continues....
    Sample paragraph testing $10\times10\times10\times10$ text continues....
    Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....
    

    UPDATE 3

    New requirement -- there can be multiple digit-dollar sequences in a single line.

    This complicates the code a bit, but not much.

    use strict;
    use warnings;
    
    while (<DATA>)
    {
        # walk the string looking for strings of the form "10$\times$10$\times$10$\times$10"
    
        while (s/(.*?)((\$?\d+\$?\\times)+\$?\d+\$?)//)
        {
            # output any data that preceded the digit-dollar sequence
            print $1;
    
            my $block = $2;
    
            # Remove all dollars
            $block =~ s/\$+//g;
    
            # put back the initial dollar
            $block =~ s/^(\d+)/\$$1/;
    
            # and the terminating dollar
            $block =~ s/$/\$/;
    
            # output the modified digit-dollar sequence
            print $block;
        }
    
        # output trailing text
        print;
    
    }
    
    
    Input:
    
    __DATA__
    where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.
    
    Sample paragraph testing 10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
    Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....
    

    output is

    where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.
    
    Sample paragraph testing $10\times10\times10$ text continues....
    Sample paragraph testing $10\times10\times10\times10$ text continues....
    Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....