Search code examples
stringbashsplitextractcut

extract a specific word between two values


I curl a html page and stock output into variable, so I try to extract a word between two value, but I failed.

 </tr> <tr> <td><a <a href="https://test/one/AAA">AAA</a></td>
 <td>Thu Aug 30 09:59:36 UTC 2018</td> <td align="right"> 2247366 </td>
 <td></td> </tr> <tr> <td><a
 href="https://test/one/1.1.22">1.1.22</a></td> <td>Thu Aug 30 09:59:36
 UTC 2018</td> <td align="right"> 5 </td> <td></td> </tr> </table>
 </body> </html>

 content=$(curl -s https://test/one/)
 echo $content | sed -E 's_.*one/([^"]+).*_\1_'

I try to catch value after one/ and before ", so I want to extract AAA, 1.1.22,...


Solution

  • $ ... | sed -E 's_.*one/([^"]+).*_\1_'
    
    AAA
    BBB
    

    since you have slash in your content, better to choose a different delimiter, here I used _.

    UPDATE Since you changed the input file format dramatically, here is the updated script

    $ echo "$contents" | sed -nE '/one/s_.*one/([^"]+).*_\1_p'
    AAA
    1.1.22