Search code examples
pythonpandasnumpydata-cleaningnba-api

How to combine every 2 rows into 1 row


I am trying to fix this NBA game detail table. Every two rows are the same game(GAME_ID) in the table. I want the two rows to be in one row so that the home team and the away team's data will be on the same row.

NBA game details


Solution

  • You can use set_index and unstack, then flatten headers.

    Here is some code I did for baseball games, I had to create my own Game No:

    # source datafrome
    year = '2020'
    df = pd.read_excel('https://www.sportsbookreviewsonline.com/scoresoddsarchives/mlb/mlb%20odds%202011.xlsx')
    df = df.dropna(how="all").copy()
    
    df["Date"] = df["Date"].astype(int)  # To handle excel float type in some years
    df["Date"] = pd.to_datetime(
        f"{year}" + df["Date"].astype(str).str.rjust(4, "0"), format="%Y%m%d"
    )
    
    cols = ["Date", "VH", "Team", "Final", "Close"]
    df = df[cols]
    
    df = df[df["VH"].isin(["V", "H"])]
    
    # Flatten teams lines in to games
    df["Game No"] = (df["VH"] == "V").cumsum()
    df = df.set_index(["Game No", "Date", "VH"]).unstack()
    df.columns = df.columns.map("_".join)
    
    df = df.reset_index()
    

    Output:

          Game No       Date Team_H Team_V  Final_H  Final_V  Close_H  Close_V
    0           1 2020-03-31    WAS    ATL        0        2      119     -139
    1           2 2020-03-31    CIN    MIL        7        6     -110     -110
    2           3 2020-03-31    STL    SDG        3        5     -178      158
    3           4 2020-03-31    LOS    SFO        2        1     -114     -106
    4           5 2020-03-31    NYY    DET        6        3     -152      132
    ...       ...        ...    ...    ...      ...      ...      ...      ...
    2462     2463 2020-10-22    TEX    STL        7       16     -183      163
    2463     2464 2020-10-23    TEX    STL        4        0     -171      151
    2464     2465 2020-10-24    TEX    STL        4        2     -130      110
    2465     2466 2020-10-27    STL    TEX       10        9     -114     -106
    2466     2467 2020-10-28    STL    TEX        6        2     -130      110
    
    [2467 rows x 8 columns]