Search code examples
bashshellcurlgrepgnu-parallel

run curl in parallel from file using GNU parallel and save output into separate file with the name of the running job number


I'm trying to run in parallel urls with curl, which are defined in one text file. Each url is on separate line. I need to call following grep command (grep -Ev 'Server:\|Date:\|Content') on every output from curl and then every output I would like to save into separate file with the name of number of job that's currently running.

I'm using GNU parallel:

parallel --results  output/{#}.txt  -j+0 --k --eta g curl -XGET -I -s --max-time 5 < mytxt.txt

input:

mytxt.txt:

  • url1
  • url2
  • url3
  • url4
  • url5

output: - every text file will contain output of curl with grep-ed information

1.txt

2.txt

3.txt

4.txt

5.txt

Questions:

  1. --result output/{#}.txt generated also file *.err and *.seq, which I dont need. How could I generate only files like 1.txt, 2.txt, 3.txt... where the number is job number ?
  2. I dont know how to use grep command with combination of parallel command (grep -Ev 'Server:\|Date:\|Content'), that will grep some information from every curl output.

Thanks for answer


Solution

  • I am happy you give --results a try. --results is built for the more advanced case where you want to keep both standard out (STDOUT) and standard error (STDERR). In your case you can simply use normal redirection '>'.

    If the command template is a composed command I prefer using a bash function. To me it makes it easier to get the quoting right. It has 2 additional benefits:

    • I can easily test the function on a single value before giving it to GNU Parallel.
    • It is easier to document each step of the function.
    doit() {
      url="$1"
      output="$2"
      curl -XGET -I -s --max-time 5 "$url" |
        # We do not care about Server, Date and Content
        grep -Ev 'Server:\|Date:\|Content' > "$output"
    }
    export -f doit
    
    parallel --eta doit {} {#}.txt < mytxt.txt 
    

    (-j+0 = default, so not needed)