Search code examples
python-3.xdataframeweb-scrapingquandl

When I download date and convert into DataFrame I lose the first column with data


I use a quandl into download a stock prices. I have a list of names of companies and I download all informations. After that, I convert it into data frame. When I do it for only one company all works well but when I try do it for all in the same time something goes wrong. The first column with data convert into index with the value from 0 to 3 insted of data

My code looks like below:

import quandl
import pandas as pd

names_of_company = [11BIT, ABCDATA, ALCHEMIA]

for names in names_of_company:
    x = quandl.get('WSE/%s' %names, start_date='2018-11-29', 
    end_date='2018-11-29',
    paginate=True)
    x['company'] = names
    results = results.append(x).reset_index(drop=True)

Actual results looks like below:

 Index Open   High    Low  Close  %Change   Volume  # of Trades  Turnover (1000)  company
    0  204.5  208.5  204.5  206.0     0.73   3461.0        105.0           717.31   11BIT
    1  205.0  215.0  202.5  214.0     3.88  10812.0        392.0          2254.83  ABCDATA 
    2  215.0  215.0  203.5  213.0    -0.47  12651.0        401.0          2656.15 ALCHEMIA  

But I expected:

Data         Open   High    Low  Close  %Change   Volume  # of Trades  Turnover (1000)  company
2018-11-29  204.5  208.5  204.5  206.0     0.73   3461.0        105.0           717.31   11BIT
2018-11-29  205.0  215.0  202.5  214.0     3.88  10812.0        392.0          2254.83  ABCDATA 
2018-11-29  215.0  215.0  203.5  213.0    -0.47  12651.0        401.0          2656.15 ALCHEMIA  

So as you can see, there is an issue with data beacues it can't convert into a correct way. But as I said if I do it for only one company, it works. Below is code:

x = quandl.get('WSE/11BIT', start_date='2019-01-01', end_date='2019-01-03')

df = pd.DataFrame(x) 

I will be very grateful for any help ! Thanks All


Solution

  • When you store it to a dataframe, the date is your index. You lose it because when you use .reset_index(), you over write the old index (the date), and instead of the date being added as a column, you tell it to drop it with .reset_index(drop=True)

    So I'd append, but then once the whole results dataframe is populated, I'd then reset the index, but NOT drop by either doing results = results.reset_index(drop=False) or results = results.reset_index() since the default is false.

    import quandl
    import pandas as pd
    
    names_of_company = ['11BIT', 'ABCDATA', 'ALCHEMIA']
    
    results = pd.DataFrame()
    for names in names_of_company:
        x = quandl.get('WSE/%s' %names, start_date='2018-11-29', 
        end_date='2018-11-29',
        paginate=True)
        x['company'] = names
        results = results.append(x)
    
    results = results.reset_index(drop=False)
    

    Output:

    print (results)
            Date    Open    High    ...     # of Trades  Turnover (1000)   company
    0 2018-11-29  269.50  271.00    ...           280.0          1822.02     11BIT
    1 2018-11-29    0.82    0.92    ...           309.0          1027.14   ABCDATA
    2 2018-11-29    4.55    4.55    ...             1.0             0.11  ALCHEMIA
    
    [3 rows x 10 columns]