Search code examples
rfile-transfer

UCI Machine Learning Repository datasets


I am new to UCI Machine Learning Repository datasets

I have tried to download the data into R, but I can not do it.

Could someone please help with this?

Note, I am using MacBook Pro.

data capture

data capture

This is the data I want to use


Solution

  • You need to look at the data first to understand its arrangement and whether there is any metadata like a header. Your browser should be sufficient for this. The first two lines of the ionosphere.data file are:

    1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300,g
    1,0,1,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1,-0.04549,0.50874,-0.67743,0.34432,-0.69707,-0.51685,-0.97515,0.05499,-0.62237,0.33109,-1,-0.13151,-0.45300,-0.18056,-0.35734,-0.20332,-0.26569,-0.20468,-0.18401,-0.19040,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
    

    So, no header, but it is a CSV file. Can use either read.table with sep="," or read.csv with header=FALSE. You might (incorrectly as did I) assume the column names are in the other file, but this is a machine learning task where there are no labels, so the read.* functions will assign generic names to the columns of the dataframe created.

    You copy the link address with your browser to the datafile, then paste it into read.table in quotes and add the separator argument (since read.table's default separator values (whitespace) does not include commas:

    ionosphere <- read.table( "https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data",
                               sep=",")  # header=FALSE is default for read.table
    
    > str(ionosphere)
    'data.frame':   351 obs. of  35 variables:
     $ V1 : int  1 1 1 1 1 1 1 0 1 1 ...
     $ V2 : int  0 0 0 0 0 0 0 0 0 0 ...
     $ V3 : num  0.995 1 1 1 1 ...
     $ V4 : num  -0.0589 -0.1883 -0.0336 -0.4516 -0.024 ...
     $ V5 : num  0.852 0.93 1 1 0.941 ...
     $ V6 : num  0.02306 -0.36156 0.00485 1 0.06531 ...
     $ V7 : num  0.834 -0.109 1 0.712 0.921 ...
     $ V8 : num  -0.377 -0.936 -0.121 -1 -0.233 ...
     $ V9 : num  1 1 0.89 0 0.772 ...
     $ V10: num  0.0376 -0.0455 0.012 0 -0.164 ...
     $ V11: num  0.852 0.509 0.731 0 0.528 ...
     $ V12: num  -0.1776 -0.6774 0.0535 0 -0.2028 ...
     $ V13: num  0.598 0.344 0.854 0 0.564 ...
     $ V14: num  -0.44945 -0.69707 0.00827 0 -0.00712 ...
     $ V15: num  0.605 -0.517 0.546 -1 0.344 ...
     $ V16: num  -0.38223 -0.97515 0.00299 0.14516 -0.27457 ...
     $ V17: num  0.844 0.055 0.838 0.541 0.529 ...
     $ V18: num  -0.385 -0.622 -0.136 -0.393 -0.218 ...
     $ V19: num  0.582 0.331 0.755 -1 0.451 ...
     $ V20: num  -0.3219 -1 -0.0854 -0.5447 -0.1781 ...
     $ V21: num  0.5697 -0.1315 0.7089 -0.6997 0.0598 ...
     $ V22: num  -0.297 -0.453 -0.275 1 -0.356 ...
     $ V23: num  0.3695 -0.1806 0.4339 0 0.0231 ...
     $ V24: num  -0.474 -0.357 -0.121 0 -0.529 ...
     $ V25: num  0.5681 -0.2033 0.5753 1 0.0329 ...
     $ V26: num  -0.512 -0.266 -0.402 0.907 -0.652 ...
     $ V27: num  0.411 -0.205 0.59 0.516 0.133 ...
     $ V28: num  -0.462 -0.184 -0.221 1 -0.532 ...
     $ V29: num  0.2127 -0.1904 0.431 1 0.0243 ...
     $ V30: num  -0.341 -0.116 -0.174 -0.201 -0.622 ...
     $ V31: num  0.4227 -0.1663 0.6044 0.2568 -0.0571 ...
     $ V32: num  -0.5449 -0.0629 -0.2418 1 -0.5957 ...
     $ V33: num  0.1864 -0.1374 0.5605 -0.3238 -0.0461 ...
     $ V34: num  -0.453 -0.0245 -0.3824 1 -0.657 ...
     $ V35: Factor w/ 2 levels "b","g": 2 1 2 1 2 1 2 1 2 1 ...