I am new to UCI Machine Learning Repository datasets
I have tried to download the data into R, but I can not do it.
Could someone please help with this?
Note, I am using MacBook Pro.
You need to look at the data first to understand its arrangement and whether there is any metadata like a header. Your browser should be sufficient for this. The first two lines of the ionosphere.data
file are:
1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300,g
1,0,1,-0.18829,0.93035,-0.36156,-0.10868,-0.93597,1,-0.04549,0.50874,-0.67743,0.34432,-0.69707,-0.51685,-0.97515,0.05499,-0.62237,0.33109,-1,-0.13151,-0.45300,-0.18056,-0.35734,-0.20332,-0.26569,-0.20468,-0.18401,-0.19040,-0.11593,-0.16626,-0.06288,-0.13738,-0.02447,b
So, no header, but it is a CSV file. Can use either read.table
with sep="," or read.csv
with header=FALSE. You might (incorrectly as did I) assume the column names are in the other file, but this is a machine learning task where there are no labels, so the read.*
functions will assign generic names to the columns of the dataframe created.
You copy the link address with your browser to the datafile, then paste it into read.table
in quotes and add the separator argument (since read.table
's default separator values (whitespace) does not include commas:
ionosphere <- read.table( "https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data",
sep=",") # header=FALSE is default for read.table
> str(ionosphere)
'data.frame': 351 obs. of 35 variables:
$ V1 : int 1 1 1 1 1 1 1 0 1 1 ...
$ V2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ V3 : num 0.995 1 1 1 1 ...
$ V4 : num -0.0589 -0.1883 -0.0336 -0.4516 -0.024 ...
$ V5 : num 0.852 0.93 1 1 0.941 ...
$ V6 : num 0.02306 -0.36156 0.00485 1 0.06531 ...
$ V7 : num 0.834 -0.109 1 0.712 0.921 ...
$ V8 : num -0.377 -0.936 -0.121 -1 -0.233 ...
$ V9 : num 1 1 0.89 0 0.772 ...
$ V10: num 0.0376 -0.0455 0.012 0 -0.164 ...
$ V11: num 0.852 0.509 0.731 0 0.528 ...
$ V12: num -0.1776 -0.6774 0.0535 0 -0.2028 ...
$ V13: num 0.598 0.344 0.854 0 0.564 ...
$ V14: num -0.44945 -0.69707 0.00827 0 -0.00712 ...
$ V15: num 0.605 -0.517 0.546 -1 0.344 ...
$ V16: num -0.38223 -0.97515 0.00299 0.14516 -0.27457 ...
$ V17: num 0.844 0.055 0.838 0.541 0.529 ...
$ V18: num -0.385 -0.622 -0.136 -0.393 -0.218 ...
$ V19: num 0.582 0.331 0.755 -1 0.451 ...
$ V20: num -0.3219 -1 -0.0854 -0.5447 -0.1781 ...
$ V21: num 0.5697 -0.1315 0.7089 -0.6997 0.0598 ...
$ V22: num -0.297 -0.453 -0.275 1 -0.356 ...
$ V23: num 0.3695 -0.1806 0.4339 0 0.0231 ...
$ V24: num -0.474 -0.357 -0.121 0 -0.529 ...
$ V25: num 0.5681 -0.2033 0.5753 1 0.0329 ...
$ V26: num -0.512 -0.266 -0.402 0.907 -0.652 ...
$ V27: num 0.411 -0.205 0.59 0.516 0.133 ...
$ V28: num -0.462 -0.184 -0.221 1 -0.532 ...
$ V29: num 0.2127 -0.1904 0.431 1 0.0243 ...
$ V30: num -0.341 -0.116 -0.174 -0.201 -0.622 ...
$ V31: num 0.4227 -0.1663 0.6044 0.2568 -0.0571 ...
$ V32: num -0.5449 -0.0629 -0.2418 1 -0.5957 ...
$ V33: num 0.1864 -0.1374 0.5605 -0.3238 -0.0461 ...
$ V34: num -0.453 -0.0245 -0.3824 1 -0.657 ...
$ V35: Factor w/ 2 levels "b","g": 2 1 2 1 2 1 2 1 2 1 ...