Search code examples
pythonpandaspandas-groupbytruncation

Panda's output for "describe" using "group by" is not complete due to truncation


I am using Python 3 and Pandas for a data science project. However, I am having some problems with panda's syntax.

The code bellow does something close to what I want:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('breast-cancer-wisconsin.data.txt')

print (df.groupby('class').describe())

I got the data on Breast Cancer from the link. The specific file with the data that I am using is breast-cancer-wisconsin.data.

It returns:

      bland_chrom                                                \
            count      mean       std  min  25%  50%  75%   max   
class                                                             
2           458.0  2.100437  1.080339  1.0  1.0  2.0  3.0   7.0   
4           241.0  5.979253  2.273852  1.0  4.0  7.0  7.0  10.0   

      clump_thickness            ...  unif_cel_shape       unif_cel_size  \
                count      mean  ...             75%   max         count   
class                            ...                                       
2               458.0  2.956332  ...             1.0   8.0         458.0   
4               241.0  7.195021  ...             9.0  10.0         241.0   


           mean       std  min  25%  50%   75%   max  
class                                                 
2      1.325328  0.907694  1.0  1.0  1.0   1.0   9.0  
4      6.572614  2.719512  1.0  4.0  6.0  10.0  10.0  

[2 rows x 72 columns]

Nonetheless, this is not the full output. The three consecutive dots ... indicate that some things are being hidden due to truncation.

How can I get the full result?

Thanks.


Solution

  • I am not sure if this is the best technical way to solve the problem, but to avoid truncation I did:

    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.read_csv('breast-cancer-wisconsin.data.txt')
    
    pd.options.display.max_columns = 999
    
    print (df.groupby('class').describe())
    

    Which returns the correct output:

          bland_chrom                                                \
                count      mean       std  min  25%  50%  75%   max   
    class                                                             
    2           458.0  2.100437  1.080339  1.0  1.0  2.0  3.0   7.0   
    4           241.0  5.979253  2.273852  1.0  4.0  7.0  7.0  10.0   
    
          clump_thickness                                                    id  \
                    count      mean       std  min  25%  50%   75%   max  count   
    class                                                                         
    2               458.0  2.956332  1.674318  1.0  1.0  3.0   4.0   8.0  458.0   
    4               241.0  7.195021  2.428849  1.0  5.0  8.0  10.0  10.0  241.0   
    
                                                                                   \
                   mean            std      min         25%        50%        75%   
    class                                                                           
    2      1.107591e+06  723431.757966  61634.0  1002614.25  1180170.5  1256870.5   
    4      1.003505e+06  322232.308608  63375.0   832226.00  1126417.0  1221863.0   
    
                      marg_adhesion                                                \
                  max         count      mean       std  min  25%  50%  75%   max   
    class                                                                           
    2      13454352.0         458.0  1.364629  0.996830  1.0  1.0  1.0  1.0  10.0   
    4       1371026.0         241.0  5.547718  3.210465  1.0  3.0  5.0  8.0  10.0   
    
          mitoses                                               norm_nucleoli  \
            count      mean       std  min  25%  50%  75%   max         count   
    class                                                                       
    2       458.0  1.063319  0.501995  1.0  1.0  1.0  1.0   8.0         458.0   
    4       241.0  2.589212  2.557939  1.0  1.0  1.0  3.0  10.0         241.0   
    
                                                         single_epith_cell_size  \
               mean       std  min  25%  50%   75%   max                  count   
    class                                                                         
    2      1.290393  1.058856  1.0  1.0  1.0   1.0   9.0                  458.0   
    4      5.863071  3.350672  1.0  3.0  6.0  10.0  10.0                  241.0   
    
                                                        unif_cel_shape            \
               mean       std  min  25%  50%  75%   max          count      mean   
    class                                                                          
    2      2.120087  0.917130  1.0  2.0  2.0  2.0  10.0          458.0  1.443231   
    4      5.298755  2.451606  1.0  3.0  5.0  6.0  10.0          241.0  6.560166   
    
                                              unif_cel_size                      \
                std  min  25%  50%  75%   max         count      mean       std   
    class                                                                         
    2      0.997836  1.0  1.0  1.0  1.0   8.0         458.0  1.325328  0.907694   
    4      2.562045  1.0  4.0  6.0  9.0  10.0         241.0  6.572614  2.719512   
    
    
           min  25%  50%   75%   max  
    class                             
    2      1.0  1.0  1.0   1.0   9.0  
    4      1.0  4.0  6.0  10.0  10.0