Search code examples
pythonarrayspython-3.xpandasnested-loops

Python: How to separate array into chunks


I am still very new to python programming

I have an array I am trying to break down into chuncks. My array seems to have multiple arrays within it (I think).

The output looks something like this:

[array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
       '0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
      dtype=object)
 array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object)
 array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object)

This a snippet of the printed output. Is there any way to display this output in one array with 24 columns?

I created my array based off a dataframe I created with 24 columns. I wanted to populate those columns using a for loop. The loop works but it only populates the array.

Here is some sample output from my dataframe. I have 24 "status" columns and a column named "Account Opened Date"

this is the output of one of the status columns:

0       1
1       0
2       P
3       0
4    None
Name: status6, dtype: object 

The idea is to take the output of all 24 status columns and place them in new columns named "stat" which will also have a range of 24. so the output of status 24 would be populated in stat 1 and status 23 would populate stat 2 etc.

I saw this example of how to break an array into chunks but I couldn't get the output I wanted. https://www.geeksforgeeks.org/break-list-chunks-size-n-python/

from datetime import date
import pandas as pd

df = pd.read_sql(sql,cnxn)

#add stat1-24 into the data frame
df = df.join(pd.DataFrame({
        'stat1':'','stat2':'','stat3':'','stat4':'',
        'stat5':'','stat6':'','stat7':'','stat8':'',
        'stat9':'','stat10':'','stat11':'','stat12':'',
        'stat13':'','stat14':'','stat15':'','stat16':'',
        'stat17':'','stat18':'','stat19':'','stat20':'',
        'stat21':'','stat22':'','stat23':'','stat24':'',},index=df.index))

#call status1-24 from the data frame and store the columns in an array
status = df.as_matrix(columns=df.columns[6:30])

#call stat1-24 from the data frame and store the columns in an array
stat = df.as_matrix(columns=df.columns[31:55])

l = len(df)

#calculate difference in months between startDate and AccountOpenedDate
def monthly_diff(d2,startDate):
    return(d2.year - startDate.year) * 12 + d2.month - startDate.month

startDate = date(year=2017, month = 7, day = 1)

df['Difference_IN_Months'] = df['AccountOpenedDate']


for x in range(l):
    d2_1=df['AccountOpenedDate'][x]
    d2=d2_1.date()
    df['Difference_IN_Months'][x]= monthly_diff(d2,startDate)
    for i in range(0,23):
        if 3 <= 24 - monthly_diff(d2,startDate) - i + 1 <=24:    
            stat[x,i] = status[24 - monthly_diff(d2,startDate) - i + 1] 
        else: stat[x,i]=''


print(stat[1,:])

I hope my code isn't too confusing. Everything works fine except the part where my array "stat" should populate my dataframe columns (stat1-stat24) with the relevant data.


Solution

  • This is the best I can understand from your code and question.

    import pandas as pd
    import numpy as np
    
    
    
    start=0
    l=[np.array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
           '0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
          dtype=object),
     np.array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
           None, None, None, None, None, None, None, None, None, None, None,
           None], dtype=object),
     np.array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
           None, None, None, None, None, None, None, None, None, None, None,
           None], dtype=object)]
    
    d={'stat1':'','stat2':'','stat3':'','stat4':'','stat5':'','stat6':'','stat7':'','stat8':'','stat9':'','stat10':'','stat11':'','stat12':'','stat13':'','stat14':'','stat15':'','stat16':'','stat17':'','stat18':'','stat19':'','stat20':'','stat21':'','stat22':'','stat23':'','stat24':''}     
    df = pd.DataFrame(d,index=[0])
    
    print(df)
    for i in l:
        df.loc[len(df)] = i
    print(df)
    

    output:

      stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9  ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
    0                                                        ...
    
    [1 rows x 24 columns]
    
    
      stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9  ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
    0                                                        ...
    1  None     0     0     0     0     0     0     0     0  ...      0      0      0      0      0      0   None   None   None
    2  None  None     0     0     0     1     0     0     0  ...   None   None   None   None   None   None   None   None   None
    3  None  None     0     0     0     0     0     0  None  ...   None   None   None   None   None   None   None   None   None
    
    [4 rows x 24 columns]