Search code examples
pythonperformancesessionpython-requestskivy

How can I speed up get request, if what is a faster method?


I have some code inside of an app that is slowing me down wayyy too much, and it's a simple 'get' function... This portion of the code is just finding the location of the PDF on the internet, then extracting it. I thought it was the extraction process that was taking so long, but after some testing, I believe it's the 'get' request. I am passing a variable into the URL because there are many different PDFs that the user can indirectly select. I have tried to use kivy's Urlrequest but I honestly can't get my head around getting a result frim it. I have heard it is faster though. I have another 2 'post' sessions in different functions that work 10 times faster than this one, so not sure what the issue is...

The rest of my program is working just fine, it's just this which is adding sometimes upwards of 20-25 seconds onto load times (which is unreasonable).

I will include a working extract of the problem below for you to please try. I have found on it's first attempt at an "airport_loc" it is the slowest, please try swapping out the airport_loc variable with some of these examples: "YPAD" "YMLT" "YPPH"

What can I do different here to speed it up or simply make it more efficient?

import requests
from html2text import html2text
import re

s = requests.session()
page = s.get('https://www.airservicesaustralia.com/aip/pending/dap/AeroProcChartsTOC.htm')
text = html2text(page.text)
airport_loc = "YSSY"

finding_airport = (re.search(r'.%s.' % re.escape(airport_loc), text)).group()
ap_id_loc = int(text.index(finding_airport)) + 6
ap_id_onward = text[ap_id_loc:]
next_loc = re.search(r'[(]Y\w\w\w[)]', ap_id_onward)
next_loc_stop = text.index(next_loc.group())
ap_id_to_nxt_ap = text[ap_id_loc:next_loc_stop]
needed_text = (html2text(ap_id_to_nxt_ap))
airport_id_less_Y = airport_loc[1:]

app_1 = re.search(r'%sGN.*' % re.escape(airport_id_less_Y), needed_text)
app_2 = re.search(r'%sII.*' % re.escape(airport_id_less_Y), needed_text)

try:
    if app_2.group():
        line_of_chart = (app_2.group())
except:
    if app_1.group():
        line_of_chart = (app_1.group())
chart_title = (re.search(r'\w\w\w\w\w\d\d[-]\d*[_][\d\w]*[.]pdf', line_of_chart)).group()
# getting exact pdf now
chart_PDF = ('https://www.airservicesaustralia.com/aip/pending/dap/' + chart_title)
retrieve = s.get(chart_PDF)
content = retrieve.content
print(content)
# from here on is working fine.

I haven't included the code following this because it's not really relevant I think.

Please help me speed this thing up :(


Solution

  • It still takes 3 seconds to me with just your code. latency might come from server.

    to make request little faster, I try to edit HTTP adapter like this.

    s.mount('http://', requests.adapters.HTTPAdapter(max_retries=0))
    retrieve = s.get(chart_PDF)
    

    It shows little improvement (3sec -> 2sec)

    But have a risk for failure.

    using "asyncio" or other async http library is more better ways