Search code examples
pythonpython-requeststokennested-table

Can't get the right tokens for the nested tables


im trying to get the info of some tables in the form of a json file. The thing is i can't seem to get the right tables. See, there are two json files, the one i can get in the page, but this one just contains the non nested info, this one i can get just fine. The problem seems to be the nested ones.

the print of the tables i need to get:

enter image description here

I need the content of all those tables in a json file, but, i cant seem to get the right tokens in this case. And they allways return the login page, as if the session had expired.

here is the code i am using to scrape the tables:

    #does a json post for the last 100 elements
    url = "https://Awebsite.com/virtualaccount/entries"

    querystring = {"userId":"userid","moffset":"0"}

    payload = "sEcho=1&iColumns=4&sColumns=DateCtz%2CReason%2CDescription%2CFormattedAmount&iDisplayStart=0&iDisplayLength=100&mDataProp_0=DateCtz&mDataProp_1=Reason&mDataProp_2=Description&mDataProp_3=FormattedAmount&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=false&sSearch_2=&bRegex_2=false&bSearchable_2=false&sSearch_3=&bRegex_3=false&bSearchable_3=false&sSortCol%5B0%5D=DateCtz&bSortDir%5B0%5D=false&iSortingCols=1&bSortable_0=true&bSortable_1=false&bSortable_2=false&bSortable_3=false"


    pgtos = session.post(url,params=querystring,data=payload,headers=headers)



    #gets the texts and converts it in python json
    json_data =  pgtos.text
    json1_data = json.loads(json_data)

    cooki = session.cookies.get_dict()

    coooookie = pgtos.cookies.get_dict()

    print 'cookies'
    print cooki
    print coooookie
    print cookie


    #here is the postman area im using... the problem is i can't seem to get the payload __requestverificationtoken right. If i use the postman one it works for a while before it expires.

    url = "https://Awebsite.com/virtualaccount/transactions"

    querystring = {"entryId":"<each one of the tables has a diferent entryId>"}

    payload = "__RequestVerificationToken=QMA2UREXdlRfwIagBWIjekZG4D1ykXrFXxtWnzWV3kc55529C26MyKbL4pHNbaiTjBBAvrrbIsZEroUBJPfc0zWam4nig9oOZxQOKJ2khnZlp2YqOgFgNAj8bYxMIiDGtc9sYBIZS6M_1o6jRAl8gQ2"

    #the postman headers, if i use the headers the postman passes me it works fine too, but in this case im trying to automate the area... so if i use the cookies i get from the site it simply won't work
    headers = {
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0",
        'Accept': "*/*",
        'Referer': "https://Awebsite.com/virtualaccount/view",
        'Content-Type': "application/x-www-form-urlencoded; charset=UTF-8",
        'X-Requested-With': "XMLHttpRequest",
        'Cookie': "_ga=GA1.0.000000000.0000000000; _gid=GA1.0.000000000.0000000000; __RequestVerificationToken="+cooki['__RequestVerificationToken']+"; e5ps_sid="+cooki['e5ps_sid']+"; .ASPXAUTH="+cooki['.ASPXAUTH']+"; _gat=1",          'Connection': "keep-alive",
        'Cache-Control': "no-cache",
        'Postman-Token': "..."
        }

    response = requests.request("POST", url, data=payload, headers=headers, params=querystring)

    print(response.text) #end of the postman test, if i use all the postman tokens and stuff it works for a time.



    #initiate the counter in 0
    i = 0
    #verify each of the items in the superior json file
    for index in json1_data['VirtualAccountEntries']:


        data = json1_data['VirtualAccountEntries'][i]['DateCtz']
        _date = datetime.date(int(data[6:10]),int(data[3:5]),int(data[0:2]))
        if current_day== _date or prvious_day == _data:

            url = "https://Awebsite.com/virtualaccount/transactions"
            querystring = {"entryId":json1_data['VirtualAccountEntries'][i]['Id']}

            payload = "__RequestVerificationToken=<the requestverification token i can't seem to get right>"

            payment = requests.request("POST", url, data=payload, headers=headers, params=querystring)

            print payment.text


            with open("C:\\... +".json" , "w") as fp:
                json.dump(pagamentos.content, fp)

        i +=1

and here is the HTML area with all the nested tables open, each time i open a nested table a new json post request is executed. So i need to get all of them by making several json post requests in the code, each time i need to change the of the request so i get the right one from the table.

    <div class="table-responsive">
        <div id="DataTables_Table_1_wrapper" class="dataTables_wrapper" role="grid"><div class="row dt-rt"><div class="col-sm-6"><div id="DataTables_Table_1_length" class="dataTables_length"><label><select size="1" name="DataTables_Table_1_length" aria-controls="DataTables_Table_1"><option value="10" selected="selected">10</option><option value="25">25</option><option value="50">50</option><option value="100">100</option></select> Registry</label></div></div><div class="col-sm-6"><div class="dataTables_filter" id="DataTables_Table_1_filter"><label>Filter: <input aria-controls="DataTables_Table_1" type="text"></label></div></div></div><table class="table table-responsive server-table dataTable" data-source="/virtualaccount/entries?userId=<user id>;moffset=0" data-source-property="VirtualAccountEntries" data-source-callback="bindActions" id="DataTables_Table_1">
            <thead>
                <tr role="row"><th data-sortable="true" data-prop="DateCtz" data-render="renderDate" class="sorting_asc" role="columnheader" tabindex="0" aria-controls="DataTables_Table_1" rowspan="1" colspan="1" style="width: 121px;" aria-label="
                        Data
                    : activate to sort column ascending">
                        Data
                    </th><th data-prop="Reason" class="sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 112px;" aria-label="
                        Type
                    ">
                        Type
                    </th><th data-prop="Description" class="sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 218px;" aria-label="
                        Description
                    ">
                        Description
                    </th><th data-prop="FormattedAmount" class="total sorting_disabled" role="columnheader" rowspan="1" colspan="1" style="width: 133px;" aria-label="
                        Value
                    ">
                        Value
                    </th></tr>
            </thead>
        <tbody role="alert" aria-live="polite" aria-relevant="all"><tr class="odd"><td class=" sorting_1">31/07/2018 21:00:00</td><td class="">Saldo</td><td class="">Saldo</td><td class="">-106,15</td></tr><tr class="even"><td class=" sorting_1">01/08/2018 21:00:00</td><td class="">Saldo</td><td class="">Saldo</td><td class="">-106,15</td></tr><tr class="odd" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">3,80</td></tr><tr class="subdata"><td>06/08/2018 17:34:43</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 3,80</td></tr><tr class="even" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">43,74</td></tr><tr class="subdata"><td>06/08/2018 15:36:01</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 43,74</td></tr><tr class="odd" id="row-<Sale ID>"><td class=" sorting_1"><img id="loader-<Sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<Sale ID>" id="explosion-<Sale ID>" href="www.Awebsite.com/transactions?entryId=<Sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">71,07</td></tr><tr class="subdata"><td>06/08/2018 16:12:38</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 59,98</td></tr><tr class="subdata"><td>06/08/2018 15:51:33</td><td>Transaction</td><td>Credit transaction with authorization 029530.</td><td>$ 11,09</td></tr><tr class="even" id="row-<another sale ID>"><td class=" sorting_1"><img id="loader-<another sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<another sale ID>" id="explosion-<another sale ID>" href="www.Awebsite.com/transactions?entryId=<another sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">51,52</td></tr><tr class="subdata"><td>06/08/2018 16:02:19</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 51,52</td></tr><tr class="odd" id="row-<yet another id for another sale>"><td class=" sorting_1"><img id="loader-<yet another id for another sale>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<yet another id for another sale>" id="explosion-<yet another id for another sale>" href="www.Awebsite.com/transactions?entryId=<yet another id for another sale>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">75,11</td></tr><tr class="subdata"><td>06/08/2018 09:33:31</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 75,11</td></tr><tr class="even" id="row-<Sale ID>"><td class=" sorting_1"><img id="loader-<Sale ID>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<Sale ID>" id="explosion-<Sale ID>" href="www.Awebsite.com/transactions?entryId=<Sale ID>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">22,21</td></tr><tr class="subdata"><td>06/08/2018 14:31:36</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 22,21</td></tr><tr class="odd" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>08/08/2018 19:02:23</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">122,12</td></tr><tr class="subdata"><td>06/08/2018 14:39:01</td><td>Transaction</td><td>Credit transaction with authorization NUMBER.</td><td>$ 122,12</td></tr><tr class="even" id="row-<sale id>"><td class=" sorting_1"><img id="loader-<sale id>" class="row-loader" src="www.Awebsite.com/source.gif" style="display: none;"><a data-id="<sale id>" id="explosion-<sale id>" href="www.Awebsite.com/transactions?entryId=<sale id>" class="explosion" style="display: none;"><input name="__RequestVerificationToken" value="<Really long token here>" type="hidden"><i class="show-tooltip fa-explosion-row fa fa-plus-square-o" title="" data-original-title="Visualizar"></i></a>09/08/2018 16:22:56</td><td class="">Transactions</td><td class="">Transaction payment</td><td class="">48,51</td></tr><tr class="subdata"><td>07/08/2018 13:35:55</td><td>Transaction</td><td>Debit transaction with authorization NUMBER.</td><td>$ 48,51</td></tr></tbody></table><div class="row dt-rb"><div class="col-sm-4"></div><div class="col-sm-8"><div class="dataTables_paginate paging_bootstrap"><ul class="pagination"><li class="prev disabled"><a href="#">← Previous</a></li><li class="active"><a href="#">1</a></li><li><a href="#">2</a></li><li><a href="#">3</a></li><li><a href="#">4</a></li><li><a href="#">5</a></li><li class="next"><a href="#">Next → </a></li></ul></div></div></div></div>
    </div>
</div>

every time i try to automate the process, change the tokens or anything the only thing that returns is the login page. however if i use the tokens and cookies the postman app give me i can download the info just fine.

EDIT 1:

I am able to find the XHR json request on the requests the site makes but when i cant get the right tokens and cookies for it. The code needs to return an Json file with a bunch of data i need but wiouth the postman app i can only recover the html withouth the tables i actually need.

my requests

the tokens and cookies aren't the same as the ones i can get with session.cookies.get_dict() or by any means in the current session

Can someone please help me?

Thanks!!


Solution

  • i managed to get it right. deep in the HTML code there is a script with the __RequestVerificationToken value, so, when i found out about it was just a matter of using BeautifulSoup to parse the scripts and find the right text with the token, then pass the token to the request headers so the session won't expire.