Search code examples
pythonpandassqlitenumeric

Having issues trying to make my dataframe numeric


So I have a sqlite local database, I read it into my program as a pandas dataframe using

""" Seperating hitters and pitchers """
pitchers = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE BF_y >= 20 AND BF_x >= 20", northwoods_db)
hitters = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE PA_y >= 25 AND PA_x >= 25", northwoods_db)

But when I do this, some of the numbers are not numeric. Here is a head of one of the dataframes:

index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21  -0.3    Hillsdale   GMAC    NCAA    None    5   None    ... 4.0 None    3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    --  None    Duke    ACC NCAA    None    15  None    ... 13  0   1   88  2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21  0.1 Wisconsin-Milwaukee Horz    NCAA    None    8   None    ... 1.0 0.0 2.0 21.0    2.25    9.0 0.0 11.3    11.3    1.0
3   357 2017    22  1.0 Nova Southeastern   SSC NCAA    None    15.0    None    ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21  -0.4    Creighton   BigE    NCAA    None    4   None    ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

When I try to make the dataframe numeric, I used this line of code:

hitters = hitters.apply(pd.to_numeric, errors='coerce')
pitchers = pitchers.apply(pd.to_numeric, errors='coerce')

But when I did that, the new head of the dataframes is full of NaN's, it seems like it got rid of all of the string values but I want to keep those.

    index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21.0    -0.3    NaN NaN NaN NaN 5.0 NaN ... 4.0 NaN 3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    NaN NaN NaN NaN NaN NaN 15.0    NaN ... 13.0    0.0 1.0 88.0    2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21.0    0.1 NaN NaN NaN NaN 8.0 NaN ... 1.0 0.0 2.0 21.0    2.250   9.0 0.0 11.3    11.3    1.00
3   357 2017    22.0    1.0 NaN NaN NaN NaN 15.0    NaN ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21.0    -0.4    NaN NaN NaN NaN 4.0 NaN ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

Is there a better way to makethe number values numeric and keep all my string columns? Maybe there is an sqlite function that can do it better? I am not sure, any help is appriciated.


Solution

  • Maybe you can use combine_first:

    hitters_new = hitters.apply(pd.to_numeric, errors='coerce').combine_first(hitters)
    pitchers_new = pitchers.apply(pd.to_numeric, errors='coerce').combine_first(pitchers)