Team,
I'm having difficulty getting the output I need when I scrape this web page:
This is what I have:
import urllib2
from html2text import html2text
for line in html2text(urllib2.urlopen("http://www.forexfactory.com/explorerapi.php?content=positions&do=positions_graph_data&limit=&interval=M5&¤cy=AUDUSD").read()).split(','):
if "traders_short"in line:
print "Traders Short AUDUSD: ", line.split(":")[1].strip(' " ')
if "traders_long" in line:
print "Traders Long AUDUSD: ", line.split(":")[1].strip(' " ')
This is my output:
Traders Short AUDUSD:
"114
Traders Long AUDUSD: 88
Traders Long AUDUSD: 88
This is what I would like:
Traders Short AUDUSD: number
Traders Long AUDUSD: number
So the problem is:
A) The output is repeating, I only want it to tell me how many traders are short or long ONCE.
B) I can't get rid of the ' " ' in the second line of the output and I want it to sit next to the ' : ' like the next line.
Now here is some more info, this is what the page looks like once its been tidied up with html2text:
{"total":"355468"
"positions":[{"timeframe":"M5"
"dateline":79500
"currency_co
de":"AUDUSD"
"short_lots":"22.405234"
"long_lots":"5.1432014"
"traders_short":
"113"
"traders_long":"88"
"weekend":false
"hidden":false
"pos":1
"datetime":{"
year":"1970"
"month":0
"date":"01"
"hour":"22"
"minute":"05"
"estOffset":5}
"l
ots_ratio":18.669667897002
"traders_ratio":43.781094527363
"dummy_lots":-81.33
0332102998
"dummy_traders":-56.218905472637}
{"timeframe":"M5"
"dateline":7980
0
"currency_code":"AUDUSD"
"short_lots":"22.405234"
"long_lots":"5.1432014"
"t
raders_short":"113"
"traders_long":"88"
"weekend":false
"hidden":true
"pos":2
"datetime":1
"lots_ratio":18.669667897002
"traders_ratio":43.781094527363
"dum
my_lots":-81.330332102998
"dummy_traders":-56.218905472637}]
"data_count":1
"h
as_more":true
"interval":"M5"
"currency":"AUDUSD"
"limit":0}
Now obviously 'traders short / long' appears more than once which is why its printing twice. But I need it to only print once.
Any help from the expertise available at this forum would be great!
Thanks.
I'd use requests
because it's so convenient, e.g. it has a built-in json()
method. You can also easily unpack that long URL into a more readable query dict, and pass that in with the basic URL.
Here's how I would do this:
import requests
base_url = "http://www.forexfactory.com/explorerapi.php"
query = {'content': 'positions',
'do': 'positions_graph_data',
'limit': '',
'interval': 'M5',
'currency': 'AUDUSD'}
r = requests.get(base_url, query)
template = "Traders Short {currency_code}: {traders_short}\n"
template += "Traders Long {currency_code}: {traders_long}\n"
for position in r.json()['positions']:
if not position['hidden']:
print(template.format(**position))
Importantly, r.json()
is just a dictionary. I chose to hide the 'hidden'
results, which seem to be duplicates, but of course you can do any processing you like at this point. The result of this is:
Traders Short AUDUSD: 116
Traders Long AUDUSD: 88