Search code examples
pythonpandascsvpycharmpandas-datareader

Wrong parsing when importing csv file in python


I am trying to import a csv format file. this is tick trading data info. The file is as follows:

0,2017-09-18 02:00:06,12568.00,1,201,12567.00,12568.00,5462,0,0,C,
0,2017-09-18 02:00:06,12568.50,2,203,12567.00,12568.00,5463,0,0,C,
0,2017-09-18 02:00:06,12569.00,1,204,12567.00,12569.00,5468,0,0,C,
0,2017-09-18 02:00:06,12569.00,1,205,12567.00,12569.00,5470,0,0,C,
0,2017-09-18 02:00:06,12569.50,3,208,12567.00,12569.00,5471,0,0,C,

I am using this python code:

import pandas as pd
df = pd.read_csv("XG#/20170918.txt", names=['empty', 'date time', 'last', 'last size', 'bid', 'ask'])
print(df.head(1))

my output is this:

                empty  date time  last  \ 0 2017-09-18 02:00:06 12567.0 200.0 200.0 12567.0  12567.0     5430.0   0.0   

                                               last size bid  ask   0 2017-09-18 02:00:06 12567.0 200.0 200.0 12567.0        0.0   C  NaN 

Process finished with exit code 0

My questions are:

  1. Why my "names" (headers) are not starting on the first column?
  2. How do I make 2nd column as date-time and index?
  3. How do I widen the result so I will see all the data in one line (I am using pycharm)? since I need to make date-time as index, I need to remove column 0 but when using df.drop(df.index[0]) nothing happens.

Any help is welcome!


Solution

  • There are 10 columns and you have names for 6 columns, so this how the code should look like:

    df = pd.read_csv('lol.csv',usecols = list(range(0,6)),names=['empty', 'date_time', 'last', 'last_size', 'bid', 'ask'])
    

    i used the first 6 columns, please feel to understand the below example and name your desired columns.

    usecols is where you put a list of your column numbers which you want it to be named.

    for eg : if you want col 1,3,4 to be named as name,gender,address then the code will look like

    pd.read_csv('lol.csv',usecols = [1,3,4],names=['name','gender','address'])
    

    for the third question

    df = pd.read_csv('lol.csv',usecols = list(range(0,6)),names=['empty','date_time', 'last', 'last_size', 'bid', 'ask'],index_col = 'date_time' ) 
    

    you can use the index_col parameter to tell which column to use as index.

    to drop a column after you import an csv in variable (for eg: df ) using pandas, use the following code:

    df.drop('empty', axis=1, inplace=True)