Search code examples
pythonarrayspandasdataframemedical

How to transform a Dataframe of xyz coordinates into a binary array of shape (272, 512, 512)


I have a Dataframe that corresponds to a 3D centerline (x,y,z). I want to turn the Dataframe into a binary array with shape (272, 512, 512). The z values from the Dataframe range from about 40-160 and they correspond to the first column in the array. The x and y values correspond to the second and third columns in the array, respectively. Any xyz value not in the Dataframe should correspond to a 0 in the array and any value that is present should correspond to a 1. Any ideas on how to do this considering each plane/slice may have multiple 1's in the array?

I was able to accomplish this if I limited the Dataframe to only have one row per unique z value (one point for each slice) but the real data has multiple rows per unique z value.

Here is what the header of the Dataframe looks like

This is the code that works for downsampled Dataframe (only one row per unique z value):

def dataframe_to_binary_array(df):
    '''
    THIS FUNCTION TAKES IN A DOWNSAMPLED DATAFRAME AND CONVERTS IT TO A 3D
    BINARY ARRAY THAT IS THE SAME SHAPE AS THE ORIGINAL DICOM STACK
    '''
    empty_array = np.zeros([272, 512, 512], dtype='int64')
    z_column = df['Z']

    for z in z_column:
        z_df = df[z_column == z]

        for k in range(0, 272):
            x = z_df['X']
            y = z_df['Y']
            empty_array[z, x, y] = 1

    return empty_array

Here is my attempt at code for the true Dataframe:

def dataframe_to_binary_array_new(df):
    '''
    THIS FUNCTION TAKES IN A DOWNSAMPLED DATAFRAME AND CONVERTS IT TO A 3D
    BINARY ARRAY THAT IS THE SAME SHAPE AS THE ORIGINAL DICOM STACK
    '''
    empty_array = np.zeros([272, 512, 512], dtype='int64')
    z_column = df['Z']

    for i in range(0,272):
        z_df = df[z_column == i]

        for row in z_df:
            x_col = z_df['X'].to_numpy()
            y_col = z_df['Y'].to_numpy()

            for x_element in x_col:
                x = int(x_element)

            for y_element in y_col:
                y = int(y_element)
                empty_array[i,x,y] = 1


    return empty_array

The error message I get is "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices"


Solution

  • I'd come at this a different way. How about iterating over the rows of the original dataframe. Then use the coordinate from each dataframe row to set the appropriate element in empty_array to 1.

    Below's some example code. empty_array is renamed as binary_array. You may need to convert your coordinates from floats to integers to be able to use then as indices in binary_array.

    # x, y, z are integers from [0, 10)
    n = 10
    
    binary_array = np.zeros([n]*3)
    
    # Builds 10 example coordinates
    df = pd.DataFrame(np.random.randint(n, size=(10,3)), columns=list('XYZ'))
    
    for idx, coord in df.iterrows():
        x, y, z = tuple(coord)
        binary_array[x, y, z] = 1