Search code examples
pythonpandasdataframedatatables

Create pandas DataFrame with iterative values


I want to create a pandas DataFrame with five columns: date, house type, neighborhood, realtor and zip code.

  • A date range of August 1, 2023 to September 1, 2023
  • There are 4 house types
  • 6 realtors
  • Only one neighborhood
  • Only one zipcode So each the date needs to be repeated 16 times for each house type, in this only neighborhood, for each realtor and for one unique zip code. So there would be 16 rows per day.

I can add the consecutive dates to the date column, however, I have trouble in adding the rest of the data, which I think should be nested in for loops. So far, I have these lines:

import pandas as pd
from datetime import date,timedelta
delta = timedelta(days=1)
start = date(2023,8,1)
end=date(2023,9,1)
loop_date = start

df = pd.DataFrame(columns={"Date", "House type", "Neighborhood", "Realtor", "Zipcode"})
while loop_date<=end:
    df = df.append({"Date":loop_date},ignore_index=True)
    loop_date+=delta

The intended output for each day would be the following:

Date House type Neighborhood Realtor Zipcode
8/1/23 1 a Alex 10018
8/1/23 2 a Alex 10018
8/1/23 3 a Alex 10018
8/1/23 4 a Alex 10018
8/1/23 1 a Maggie 10018
8/1/23 2 a Maggie 10018
8/1/23 3 a Maggie 10018
8/1/23 4 a Maggie 10018
8/1/23 1 a Ginger 10018
8/1/23 2 a Ginger 10018
8/1/23 3 a Ginger 10018
8/1/23 4 a Ginger 10018
8/1/23 1 a Pat 10018
8/1/23 2 a Pat 10018
8/1/23 3 a Pat 10018
8/1/23 4 a Pat 10018

Any help would be much appreciated, thank you!


Solution

  • try this:

    import pandas as pd
    from itertools import product
    
    
    house_types = [1,2,3,4]
    realtors = [*'ABCDEF']
    neighborhood = ['a']
    zip_code = ['10018']
    daily_data = product(house_types, realtors, neighborhood, zip_code)
    dates = pd.date_range('2023-8-1', '2023-9-1')
    data = product(dates, daily_data)
    idx, data = zip(*data)
    cols = ["House_type", "Realtor", "Neighborhood", "Zipcode"]
    df = pd.DataFrame(data, index=idx, columns=cols).rename_axis('Date')
    print(df)
    >>>
               House_type   Realtor    Neighborhood  Zipcode
    Date                
    2023-08-01  1           A               a        10018
    2023-08-01  1           B               a        10018
    2023-08-01  1           C               a        10018
    2023-08-01  1           D               a        10018
    2023-08-01  1           E               a        10018
       ...     ...         ...             ...        ...
    2023-09-01  4           B               a        10018
    2023-09-01  4           C               a        10018
    2023-09-01  4           D               a        10018
    2023-09-01  4           E               a        10018
    2023-09-01  4           F               a        10018