I have a Dataframe that corresponds to a 3D centerline (x,y,z). I want to turn the Dataframe into a binary array with shape (272, 512, 512). The z values from the Dataframe range from about 40-160 and they correspond to the first column in the array. The x and y values correspond to the second and third columns in the array, respectively. Any xyz value not in the Dataframe should correspond to a 0 in the array and any value that is present should correspond to a 1. Any ideas on how to do this considering each plane/slice may have multiple 1's in the array?
I was able to accomplish this if I limited the Dataframe to only have one row per unique z value (one point for each slice) but the real data has multiple rows per unique z value.
Here is what the header of the Dataframe looks like
This is the code that works for downsampled Dataframe (only one row per unique z value):
def dataframe_to_binary_array(df):
'''
THIS FUNCTION TAKES IN A DOWNSAMPLED DATAFRAME AND CONVERTS IT TO A 3D
BINARY ARRAY THAT IS THE SAME SHAPE AS THE ORIGINAL DICOM STACK
'''
empty_array = np.zeros([272, 512, 512], dtype='int64')
z_column = df['Z']
for z in z_column:
z_df = df[z_column == z]
for k in range(0, 272):
x = z_df['X']
y = z_df['Y']
empty_array[z, x, y] = 1
return empty_array
Here is my attempt at code for the true Dataframe:
def dataframe_to_binary_array_new(df):
'''
THIS FUNCTION TAKES IN A DOWNSAMPLED DATAFRAME AND CONVERTS IT TO A 3D
BINARY ARRAY THAT IS THE SAME SHAPE AS THE ORIGINAL DICOM STACK
'''
empty_array = np.zeros([272, 512, 512], dtype='int64')
z_column = df['Z']
for i in range(0,272):
z_df = df[z_column == i]
for row in z_df:
x_col = z_df['X'].to_numpy()
y_col = z_df['Y'].to_numpy()
for x_element in x_col:
x = int(x_element)
for y_element in y_col:
y = int(y_element)
empty_array[i,x,y] = 1
return empty_array
The error message I get is "IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices"
I'd come at this a different way. How about iterating over the rows of the original dataframe. Then use the coordinate from each dataframe row to set the appropriate element in empty_array
to 1
.
Below's some example code. empty_array
is renamed as binary_array
. You may need to convert your coordinates from floats to integers to be able to use then as indices in binary_array
.
# x, y, z are integers from [0, 10)
n = 10
binary_array = np.zeros([n]*3)
# Builds 10 example coordinates
df = pd.DataFrame(np.random.randint(n, size=(10,3)), columns=list('XYZ'))
for idx, coord in df.iterrows():
x, y, z = tuple(coord)
binary_array[x, y, z] = 1