Search code examples
bashcronexplodeifs

bash IFS different by terminal and cron execution


I have a file sortedurls.txt which is the result from spidering a domain to URLs line by line. sortedurls.txt looks like this

https://example.com/page1.php
https://example.com/page2.php
https://example.com/page-more.php

Looping sortedurls.txt line by line (url by url) and collect the the img tags from the page(s) with wget and hxselect. Just for verifying save to a file testtagstring.txt. This looks then like this

<img alt="…" src="/assets/…/image1.jpg">§<img alt="…" src="/assets/…/image11.jpg">
<img alt="…" src="/assets/…/image2.jpg">§

and so on

Spliting each line at the delimiter § into the array 'tags'. Count the array elements and append the result to a file for verifying.

Problem: Executing in terminal works correctly and the output shows correct amount of entries (6, 1, 1, 9 …). Executing from a cronjob, IFS doubles the amount to 12, 2, 2, 18 ….

Any idea why this changes its behaviour just by using via cron?

#!/bin/bash

# Set this script dir path
scriptdirpath=/usr/local/www/apache24/data/mydomain.com/testdir

# Some config variables
useragent=googlebot
searchtag=img
delimiter=§

# Change to pwd
cd $scriptdirpath


# Make files
echo > testtagstring.txt
echo > testimages.txt

# Loop through the sortedurls.txt
while read p; do

tagString=$(wget -qO - --user-agent="$useragent" $p | hxnormalize -x | hxselect -s "$delimiter" $searchtag )

echo $tagString >> testtagstring.txt

IFS="$delimiter" read -r -a tags <<<"$tagString"

echo "Amount of img tags: ${#tags[@]}" >> $scriptdirpath/testimages.txt

done < $scriptdirpath/sortedurls.txt

Solution

  • My scripts are UTF-8 formatted and so they are not really valid for cron which was configured to use ASCII. Adding the following in my bash script solves the problem without any change to the cron config.

    LC_ALL_SAVED="$LC_ALL"
    export LC_ALL=de_DE.UTF-8
    

    Everything now running fine from the CLI and from cron. Thanks for the help.