Search code examples
htmlawksedhtml-tablexmlstarlet

Replace html tag data in bash scripting


I want to highlight the entire row in an html file with the same color and apply the same color for the same date. Date is the first column in the html table. I have tried to write something like the below but it doesn’t work. Also am not sure how to switch the color when records have different date Code

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
tdSet=0
endTrTag="</tr>"
colors="grey blue"
for x in $tdDate
do
awk '{if (($0 ~ /$x/) & ($tdSet -eq 0)) {
sed -i 's@<td@<td bgcolor="grey"@g' 
$tdSet=1
}
elsif (($0 ~ /$endTrTag/) & ($tdSer -eq 1) {
$tdSet=0}
else {
sed -i 's@<td@<td bgcolor="grey"@g'
}}'

file
done

Sample html file


    <html>
    <table>
    <tr>
    <td>2020-08-24</td>
    <td>NYC</td>
    <td>75</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Seattle</td>
    <td>55</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Austin</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Seattle</td>
    <td>70</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>NYC</td>
    <td>68</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>San Jose</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    </table>
    </html>

Desire output


    <html>
    <table>
    <tr>
    <td bgcolor="grey">2020-08-24</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 75</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Seattle</td>
    <td bgcolor="grey"> 55</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-25</td>
    <td bgcolor="blue"> Seattle</td>
    <td bgcolor="blue"> 70</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue"> 2020-08-25</td>
    <td bgcolor="blue"> Austin</td>
    <td bgcolor="blue"> 95</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey">2020-08-26</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 68</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 95</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> San Jose</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    </table>
    </html>

Solution

  • Assuming what you really want is each date to be a different color then with input that simple/regular I'd just do:

    $ cat tst.awk
    BEGIN {
        # See https://www.w3schools.com/colors/colors_names.asp
        # for all portable HTML color names, we are just using 4 here.
        numColorsAvail = split("red green blue yellow",colors)
    }
    /<tr>/ { tdNr=0 }
    /<td>/ {
        if ( ++tdNr == 1 ) {
            date = $0
            sub(/[^>]+>[[:space:]]*/,"",date)
            sub(/[[:space:]]*<[^<]+$/,"",date)
            if ( !(date in date2color) ) {
                date2color[date] = colors[++numColorsUsed]
            }
            color = date2color[date]
        }
        sub(/>/," bgcolor=\""color"\">")
    }
    { print }
    

    .

    $ awk -f tst.awk file
        <html>
        <table>
        <tr>
        <td bgcolor="red">2020-08-24</td>
        <td bgcolor="red">NYC</td>
        <td bgcolor="red">75</td>
        <td bgcolor="red">Sunny</td>
        </tr>
        <tr>
        <td bgcolor="red">2020-08-24</td>
        <td bgcolor="red">Seattle</td>
        <td bgcolor="red">55</td>
        <td bgcolor="red">Rainy</td>
        </tr>
        <tr>
        <td bgcolor="red">2020-08-24</td>
        <td bgcolor="red">Austin</td>
        <td bgcolor="red">85</td>
        <td bgcolor="red">Sunny</td>
        </tr>
        <tr>
        <td bgcolor="green">2020-08-25</td>
        <td bgcolor="green">Seattle</td>
        <td bgcolor="green">70</td>
        <td bgcolor="green">Sunny</td>
        </tr>
        <tr>
        <td bgcolor="green">2020-08-25</td>
        <td bgcolor="green">Austin</td>
        <td bgcolor="green">95</td>
        <td bgcolor="green">Sunny</td>
        </tr>
        <tr>
        <td bgcolor="blue">2020-08-26</td>
        <td bgcolor="blue">NYC</td>
        <td bgcolor="blue">68</td>
        <td bgcolor="blue">Rainy</td>
        </tr>
        <tr>
        <td bgcolor="blue">2020-08-26</td>
        <td bgcolor="blue">Austin</td>
        <td bgcolor="blue">95</td>
        <td bgcolor="blue">Sunny</td>
        </tr>
        <tr>
        <td bgcolor="blue">2020-08-26</td>
        <td bgcolor="blue">San Jose</td>
        <td bgcolor="blue">85</td>
        <td bgcolor="blue">Sunny</td>
        </tr>
        </table>
        </html>
    

    Add a warning for numColorsUsed exceeding numColorsAvail if you like - issue a warning, set the color to "grey", reset numColorsUsed to start at the first color again, whatever you like, it's all obvious trivial stuff to handle that.

    Here are all the HTML color names and how to retrieve them yourself in case you want to build it into a script:

    $ curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2
    AliceBlue
    AntiqueWhite
    Aqua
    Aquamarine
    Azure
    Beige
    Bisque
    Black
    BlanchedAlmond
    Blue
    BlueViolet
    Brown
    BurlyWood
    CadetBlue
    Chartreuse
    Chocolate
    Coral
    CornflowerBlue
    Cornsilk
    Crimson
    Cyan
    DarkBlue
    DarkCyan
    DarkGoldenRod
    DarkGray
    DarkGrey
    DarkGreen
    DarkKhaki
    DarkMagenta
    DarkOliveGreen
    DarkOrange
    DarkOrchid
    DarkRed
    DarkSalmon
    DarkSeaGreen
    DarkSlateBlue
    DarkSlateGray
    DarkSlateGrey
    DarkTurquoise
    DarkViolet
    DeepPink
    DeepSkyBlue
    DimGray
    DimGrey
    DodgerBlue
    FireBrick
    FloralWhite
    ForestGreen
    Fuchsia
    Gainsboro
    GhostWhite
    Gold
    GoldenRod
    Gray
    Grey
    Green
    GreenYellow
    HoneyDew
    HotPink
    IndianRed
    Indigo
    Ivory
    Khaki
    Lavender
    LavenderBlush
    LawnGreen
    LemonChiffon
    LightBlue
    LightCoral
    LightCyan
    LightGoldenRodYellow
    LightGray
    LightGrey
    LightGreen
    LightPink
    LightSalmon
    LightSeaGreen
    LightSkyBlue
    LightSlateGray
    LightSlateGrey
    LightSteelBlue
    LightYellow
    Lime
    LimeGreen
    Linen
    Magenta
    Maroon
    MediumAquaMarine
    MediumBlue
    MediumOrchid
    MediumPurple
    MediumSeaGreen
    MediumSlateBlue
    MediumSpringGreen
    MediumTurquoise
    MediumVioletRed
    MidnightBlue
    MintCream
    MistyRose
    Moccasin
    NavajoWhite
    Navy
    OldLace
    Olive
    OliveDrab
    Orange
    OrangeRed
    Orchid
    PaleGoldenRod
    PaleGreen
    PaleTurquoise
    PaleVioletRed
    PapayaWhip
    PeachPuff
    Peru
    Pink
    Plum
    PowderBlue
    Purple
    RebeccaPurple
    Red
    RosyBrown
    RoyalBlue
    SaddleBrown
    Salmon
    SandyBrown
    SeaGreen
    SeaShell
    Sienna
    Silver
    SkyBlue
    SlateBlue
    SlateGray
    SlateGrey
    Snow
    SpringGreen
    SteelBlue
    Tan
    Teal
    Thistle
    Tomato
    Turquoise
    Violet
    Wheat
    White
    WhiteSmoke
    Yellow
    YellowGreen
    

    so for example to have your script automatically use all of the portable HTML color names you could do:

    awk -v htmlColors="$(curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2)" '
    BEGIN {
       numColorsAvail = split(htmlColors,colors)
    }
    ... rest of the script as above ...
    '