Search code examples
pythondataframetranspose

Transposing two columns into single row (ideally with new headers)


I'm new into python. Doing some basic webscraping of football scores and got stuck.

I have webscraped using betsapi.com, got Premierleague teams, then in single code got from each team > all games, then in new one got from single game a stats. It will still need to get looped for all matches by all teams, but starting from the bottom:

I got this:

    0           1               2
0   Brighton    NaN             Man City
1   1           Goals           1
2   5           Corners         1
3   3           Corners (Half)  0
4   2           Yellow Card     1
5   0           Yellow/Red Card 0
6   0           Red Card        0
7   6           Throw-ins       11
8   2           Offsides        0
9   11          Free kicks      18
10  7           Goal kicks      5

What I want to achieve is that all Rows >0 will be transposed to Row=0

So the results would look like

0   HomeTeam    AwayTeam HGAG...
0   Brighton    Man City 1 1 5 1 3 0 2 1 0 0 0 0 6 11

So the format is HomeTeam Away Team then each stats respectively for Home/Away

I have tried transposing with melt, var, but not solved it myself as I'm really beginner.

Header I would have probably to rename manually since there are 29 rows total (pasted only 10 to make it short) so I will get x2 columns from that in theory.

df = df.set_index(['HomeTeam','AwayTeam','HG','AG',....]) should work.

But How to transpose each 2 columns from row N into row 0 one after another? EDIT:

Thanks Timeless your code works. However looks my code wasn't proper for dataframe as it was showing 'axis = 1' not existing.

I have simplified code to:

import requests
liga_url = "https://betsapi.com/r/6557350/Brighton-vs-Man-City"
data = requests.get(liga_url)

from bs4 import BeautifulSoup
import pandas as pd

matches = pd.read_html(data.text)

matches = matches[0] # have chosen to use only first part since in all other df info is repeated or not important, I just miss referee name which is available when you click the link, but somehow read_html not getting it
matches

but with this code i'm probably missing sth in terms of str/int as getting such errors.

Maybe you can inspect further THanks,


Solution

  • You can try :

    out = (
        pd.DataFrame(df.T.drop("1").to_numpy().reshape(1, -1, order="F"))
            # .set_axis(["HomeTeam","AwayTeam", "HG", "AG", ...], axis=1) # to complete
    )
    

    Output :

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
    Brighton Man City 1 1 5 1 3 0 2 1 0 0 0 0 6 11 2 0 11 18 7 5

    Input used :

    df = pd.DataFrame({
        '0': ['Brighton', '1', '5', '3', '2', '0', '0', '6', '2', '11', '7'],
        '1': [None, 'Goals', 'Corners', 'Corners (Half)', 'Yellow Card',
              'Yellow/Red Card', 'Red Card', 'Throw-ins', 'Offsides',
              'Free kicks', 'Goal kicks'],
        '2': ['Man City', '1', '1', '0', '1', '0', '0', '11', '0', '18', '5']}
    )