Dask object memory size larger than the file size?

I have a csv file that is 15Gb in size according to du -sh filename.txt. However, when I load the file to dask, the dask array is almost 4 times larger at 55Gb. Is this normal? Here is how I am loading the file.

cluster = LocalCluster()  # Launches a scheduler and workers locally
client = Client(cluster)  # Connect to distributed cluster and override default@delayed
input_file_name = 'filename.txt'

@delayed
def load_file(fname, dtypes=dtypes):
    ddf = dd.read_csv(input_file_name, sep='\t', dtype=dtypes) #dytpes is dict of {colnames:bool}
    arr = ddf.to_dask_array(lengths=True)
    return arr
result = load_file(input_file_name)

arr = result.compute()
arr

	Array	Chunk
Bytes	54.58 GiB	245.18 MiB
Shape	(1787307, 4099)	(7840, 4099)
Count	456 Tasks	228 Chunks
Type	object	numpy.ndarray

I wasn't expecting the dask array to be so much larger than the input file size.

The file is contains binary values, so I tried passing bool dtype to see if it will shrink in size but I see no difference.

Solution

Found the answer- it was right there on the output of arr where it says the Type -> 'object'. Looks like converting from dask dataframe to array using arr = ddf.to_dask_array(lengths=True) does not preserve the bool object type. I was able to reduce the memory load significantly by explicitly casting it as bool type again.

# load the dask dataframe and convert to array
@delayed
def load_to_arr(fname, dt):
    ddf = dd.read_csv(fname, sep='\t', dtype=dt)
    arr = unitig_ddf.to_dask_array(lengths=True)
    arr = unitig_arr.astype(bool, copy=False) # recast as bool array
    return unitig_arr
result = load_to_arr(input_file_name, dt=dtypes)
arr = result.compute()
arr

This now gives an object that is of type bool as size 6Gb, which is more reasonable.

	Array	Chunk
Bytes	6.82 GiB	30.65 MiB
Shape	(1787307, 4099)	(7840, 4099)
Count	456 Tasks	228 Chunks
Type	bool	numpy.ndarray