Search code examples
pythonexpressionletters-and-numbers

How can I get letters in an expression on Python


I have this expression:

<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>

And I need to get the 10 letters next to "/dp/" (B01J5FGW66)

How can I make a function that do this?


Solution

  • Using regex:

    import re
    s = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
    print(re.search(r"dp\/([A-Za-z0-9]{10})\/", s)[1])
    

    Output:B01J5FGW66

    Explanation:

    begin at "dp/":

    dp\/ 
    

    capture group delimited by () matching 10 (through {10}) small letters(a-z), capital letters(A-Z) and numbers(0-9):

    ([A-Za-z0-9]{10})
    

    end at "/":

    \/
    

    using re.search we can search for that expression in your string sand acces the results for the 1st capture group with [1].

    Note that you might want to add extra code in case no match is found:

    m = re.search(r"dp\/([A-Za-z0-9]{10})\/", s)
    if m is not None:
        print(m[1])
    else:
        # if nothing is found, search return None
        print("No match")