I am using Python 3 and Pandas for a data science project. However, I am having some problems with panda's syntax.
The code bellow does something close to what I want:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('breast-cancer-wisconsin.data.txt')
print (df.groupby('class').describe())
I got the data on Breast Cancer from the link. The specific file with the data that I am using is breast-cancer-wisconsin.data
.
It returns:
bland_chrom \
count mean std min 25% 50% 75% max
class
2 458.0 2.100437 1.080339 1.0 1.0 2.0 3.0 7.0
4 241.0 5.979253 2.273852 1.0 4.0 7.0 7.0 10.0
clump_thickness ... unif_cel_shape unif_cel_size \
count mean ... 75% max count
class ...
2 458.0 2.956332 ... 1.0 8.0 458.0
4 241.0 7.195021 ... 9.0 10.0 241.0
mean std min 25% 50% 75% max
class
2 1.325328 0.907694 1.0 1.0 1.0 1.0 9.0
4 6.572614 2.719512 1.0 4.0 6.0 10.0 10.0
[2 rows x 72 columns]
Nonetheless, this is not the full output. The three consecutive dots ...
indicate that some things are being hidden due to truncation.
How can I get the full result?
Thanks.
I am not sure if this is the best technical way to solve the problem, but to avoid truncation I did:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('breast-cancer-wisconsin.data.txt')
pd.options.display.max_columns = 999
print (df.groupby('class').describe())
Which returns the correct output:
bland_chrom \
count mean std min 25% 50% 75% max
class
2 458.0 2.100437 1.080339 1.0 1.0 2.0 3.0 7.0
4 241.0 5.979253 2.273852 1.0 4.0 7.0 7.0 10.0
clump_thickness id \
count mean std min 25% 50% 75% max count
class
2 458.0 2.956332 1.674318 1.0 1.0 3.0 4.0 8.0 458.0
4 241.0 7.195021 2.428849 1.0 5.0 8.0 10.0 10.0 241.0
\
mean std min 25% 50% 75%
class
2 1.107591e+06 723431.757966 61634.0 1002614.25 1180170.5 1256870.5
4 1.003505e+06 322232.308608 63375.0 832226.00 1126417.0 1221863.0
marg_adhesion \
max count mean std min 25% 50% 75% max
class
2 13454352.0 458.0 1.364629 0.996830 1.0 1.0 1.0 1.0 10.0
4 1371026.0 241.0 5.547718 3.210465 1.0 3.0 5.0 8.0 10.0
mitoses norm_nucleoli \
count mean std min 25% 50% 75% max count
class
2 458.0 1.063319 0.501995 1.0 1.0 1.0 1.0 8.0 458.0
4 241.0 2.589212 2.557939 1.0 1.0 1.0 3.0 10.0 241.0
single_epith_cell_size \
mean std min 25% 50% 75% max count
class
2 1.290393 1.058856 1.0 1.0 1.0 1.0 9.0 458.0
4 5.863071 3.350672 1.0 3.0 6.0 10.0 10.0 241.0
unif_cel_shape \
mean std min 25% 50% 75% max count mean
class
2 2.120087 0.917130 1.0 2.0 2.0 2.0 10.0 458.0 1.443231
4 5.298755 2.451606 1.0 3.0 5.0 6.0 10.0 241.0 6.560166
unif_cel_size \
std min 25% 50% 75% max count mean std
class
2 0.997836 1.0 1.0 1.0 1.0 8.0 458.0 1.325328 0.907694
4 2.562045 1.0 4.0 6.0 9.0 10.0 241.0 6.572614 2.719512
min 25% 50% 75% max
class
2 1.0 1.0 1.0 1.0 9.0
4 1.0 4.0 6.0 10.0 10.0