Search code examples
rsurvey

Data importation in r by lodown


options( survey.lonely.psu = "adjust" )

library(survey)

library(lodown)

# retrieve a listing of all available extracts for the youth risk behavioral 
# surveillance system
yrbss_cat <- get_catalog( "yrbss" , output_dir = file.path( path.expand( "~" ) , "YRBSS" ) )

# limit the catalog to only years 2005-2015
yrbss_cat <- subset( yrbss_cat , year %in% seq( 2005 , 2015 , 2 ) )

# download the yrbss microdata
lodown( "yrbss" , yrbss_cat )

this code is supposed to download yrbss dataset and convert to rda files, it is not working. Can someone help?

The error is as under


options( survey.lonely.psu = "adjust" ) library(survey) Loading required package: grid Loading required package: Matrix Loading required package: survival

Attaching package: ‘survey’

The following object is masked from ‘package:graphics’:

dotchart

Warning message: package ‘survey’ was built under R version 3.4.4

library(lodown)

retrieve a listing of all available extracts for the youth risk behavioral

surveillance system

yrbss_cat <- get_catalog( "yrbss" , output_dir = file.path( path.expand( "~" ) , "YRBSS" ) ) building catalog for yrbss

retrieve a listing of all available extracts for the youth risk behavioral

surveillance system

yrbss_cat <- get_catalog( "yrbss" , output_dir = file.path( path.expand( "U:/" ) , "YRBSS" ) ) building catalog for yrbss

yrbss_cat directory year dat_url 1 1991 1991
https://ftp.cdc.gov/pub/data/yrbs/1991/yrbs1991.dat 2 1993 1993 https://ftp.cdc.gov/pub/data/yrbs/1993/yrbs1993.dat 3 1995 1995 https://ftp.cdc.gov/pub/data/yrbs/1995/nchrbs1995.dat 4 1995 1995
https://ftp.cdc.gov/pub/data/yrbs/1995/yrbs1995.dat 5 1997 1997 https://ftp.cdc.gov/pub/data/yrbs/1997/yrbs1997.dat 6 1998 1998 https://ftp.cdc.gov/pub/data/yrbs/1998/ayrbs1998.dat 7 1999 1999 https://ftp.cdc.gov/pub/data/yrbs/1999/yrbs1999.dat 8 2001 2001 https://ftp.cdc.gov/pub/data/yrbs/2001/yrbs2001.dat 9 2003 2003 https://ftp.cdc.gov/pub/data/yrbs/2003/yrbs2003.dat 10 2005 2005 https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat 11 2007 2007 https://ftp.cdc.gov/pub/data/yrbs/2007/yrbs2007.dat 12 2009 2009 https://ftp.cdc.gov/pub/data/yrbs/2009/yrbs2009.dat 13 2011 2011 https://ftp.cdc.gov/pub/data/yrbs/2011/yrbs2011.dat 14 2013 2013 https://ftp.cdc.gov/pub/data/yrbs/2013/yrbs2013.dat 15 2015 2015 https://www.cdc.gov/healthyyouth/data/yrbs/files/yrbs2015.dat 16
2017 2017 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/sadc_2017_district.dat 17 2017 2017 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/sadc_2017_national.dat 18 2017 2017 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/sadc_2017_state_a_m.dat 19 2017 2017 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/sadc_2017_state_n_z.dat sas_url output_filename 1
https://ftp.cdc.gov/pub/data/yrbs/1991/YRBS_1991_SAS_Input_Program.sas U://YRBSS/1991 main.rds 2
https://ftp.cdc.gov/pub/data/yrbs/1993/YRBS_1993_SAS_Input_Program.sas U://YRBSS/1993 main.rds 3
https://ftp.cdc.gov/pub/data/yrbs/1995/NCHRBS_1995_SAS_Input_Program.sas U://YRBSS/1995 main.rds 4
https://ftp.cdc.gov/pub/data/yrbs/1995/YRBS_1995_SAS_Input_Program.sas U://YRBSS/1995 main.rds 5
https://ftp.cdc.gov/pub/data/yrbs/1997/YRBS_1997_SAS_Input_Program.sas U://YRBSS/1997 main.rds 6
https://ftp.cdc.gov/pub/data/yrbs/1998/AYRBS_1998_SAS_Input_Program.sas U://YRBSS/1998 main.rds 7
https://ftp.cdc.gov/pub/data/yrbs/1999/YRBS_1999_SAS_Input_Program.sas U://YRBSS/1999 main.rds 8
https://ftp.cdc.gov/pub/data/yrbs/2001/YRBS_2001_SAS_Input_Program.sas U://YRBSS/2001 main.rds 9
https://ftp.cdc.gov/pub/data/yrbs/2003/YRBS_2003_SAS_Input_Program.sas U://YRBSS/2003 main.rds 10
https://ftp.cdc.gov/pub/data/yrbs/2005/YRBS_2005_SAS_Input_Program.sas U://YRBSS/2005 main.rds 11
https://ftp.cdc.gov/pub/data/yrbs/2007/YRBS_2007_SAS_Input_Program.sas U://YRBSS/2007 main.rds 12
https://ftp.cdc.gov/pub/data/yrbs/2009/YRBS_2009_SAS_Input_Program.sas U://YRBSS/2009 main.rds 13
https://ftp.cdc.gov/pub/data/yrbs/2011/YRBS_2011_SAS_Input_Program.sas U://YRBSS/2011 main.rds 14
https://ftp.cdc.gov/pub/data/yrbs/2013/YRBS_2013_SAS_Input_Program.sas U://YRBSS/2013 main.rds 15
https://ftp.cdc.gov/pub/data/yrbs/2015smy/YRBS_2015_SAS_Input_Program.sas U://YRBSS/2015 main.rds 16
https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/2017_sadc_national_sas_input_program.sas U://YRBSS/2017 main.rds 17
https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/2017_sadc_sas_input_program.sas U://YRBSS/2017 main.rds 18 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/2017_sadc_states_a-m_sas_input_program.sas U://YRBSS/2017 main.rds 19 https://ftp.cdc.gov/pub/data/yrbs/sadc_2017/2017_sadc_states_n-z_sas_input_program.sas U://YRBSS/2017 main.rds

limit the catalog to only years 2005-2015

yrbss_cat <- subset( yrbss_cat , year %in% seq( 2005 , 2015 , 2 ) ) yrbss_cat directory year dat_url 10 2005 2005
https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat 11 2007 2007 https://ftp.cdc.gov/pub/data/yrbs/2007/yrbs2007.dat 12 2009 2009 https://ftp.cdc.gov/pub/data/yrbs/2009/yrbs2009.dat 13 2011 2011 https://ftp.cdc.gov/pub/data/yrbs/2011/yrbs2011.dat 14 2013 2013 https://ftp.cdc.gov/pub/data/yrbs/2013/yrbs2013.dat 15 2015 2015 https://www.cdc.gov/healthyyouth/data/yrbs/files/yrbs2015.dat sas_url output_filename 10
https://ftp.cdc.gov/pub/data/yrbs/2005/YRBS_2005_SAS_Input_Program.sas U://YRBSS/2005 main.rds 11
https://ftp.cdc.gov/pub/data/yrbs/2007/YRBS_2007_SAS_Input_Program.sas U://YRBSS/2007 main.rds 12
https://ftp.cdc.gov/pub/data/yrbs/2009/YRBS_2009_SAS_Input_Program.sas U://YRBSS/2009 main.rds 13
https://ftp.cdc.gov/pub/data/yrbs/2011/YRBS_2011_SAS_Input_Program.sas U://YRBSS/2011 main.rds 14
https://ftp.cdc.gov/pub/data/yrbs/2013/YRBS_2013_SAS_Input_Program.sas U://YRBSS/2013 main.rds 15 https://ftp.cdc.gov/pub/data/yrbs/2015smy/YRBS_2015_SAS_Input_Program.sas U://YRBSS/2015 main.rds

download the yrbss microdata

lodown( "yrbss" , yrbss_cat ) locally downloading yrbss

downloading from URL 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat' to file 'C:\Users\JAIMIN~1\AppData\Local\Temp\Rtmpemmaly\file684c13a73873'

download issue with 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat'

download issue with 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat'

download issue with 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat'

R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
LC_TIME=English_United States.1252

attached base packages: [1] grid stats graphics grDevices utils datasets methods base

other attached packages: [1] lodown_0.1.0 survey_3.33-2
survival_2.41-3 Matrix_1.2-9

loaded via a namespace (and not attached): [1] httr_1.3.1
compiler_3.4.0 R6_2.2.2 tools_3.4.0 RCurl_1.95-4.11 curl_2.6 yaml_2.1.14 splines_3.4.0 [9] digest_0.6.16
bitops_1.0-6 lattice_0.20-35

lodown is now exiting unexpectedly. websites that host publicly-downloadable microdata change often and sometimes those changes cause this software to break. if the error call stack below appears to be a hiccup in your internet connection, then please verify your connectivity and retry the download. otherwise, please open a new issue at https://github.com/ajdamico/asdfree/issues with the contents of this error call stack and also the output of your sessionInfo().

[[1]] lodown("yrbss", yrbss_cat)

[[2]] withCallingHandlers(catalog <- load_fun(data_name = data_name, catalog, ...), error = function(e) { print(sessionInfo()) if (grepl("cannot allocate vector of size", e)) message(memory_note) else if (grepl("parameter must be specified", e)) message(parameter_note) else if (grepl("to install", e)) message(installation_note) else { message(unknown_error_note) print(sys.calls()) } })

[[3]] load_fun(data_name = data_name, catalog, ...)

[[4]] cachaca(catalog[i, "dat_url"], tf_fn, mode = "wb")

[[5]] httr_filesize(this_url, attempts, sleepsec)

[[6]] stop(paste0("httr::HEAD( '", url, "' )\nfailed after ", initial.attempts, " attempts"))

[[7]] .handleSimpleError(function (e) { print(sessionInfo()) if (grepl("cannot allocate vector of size", e)) message(memory_note) else if (grepl("parameter must be specified", e)) message(parameter_note) else if (grepl("to install", e)) message(installation_note) else { message(unknown_error_note) print(sys.calls()) } }, "httr::HEAD( 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat' )\nfailed after 3 attempts", quote(httr_filesize(this_url, attempts, sleepsec)))

[[8]] h(simpleError(msg, call))

Error in httr_filesize(this_url, attempts, sleepsec) : httr::HEAD( 'https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat' ) failed after 3 attempts directory year
dat_url 10 2005 2005
https://ftp.cdc.gov/pub/data/yrbs/2005/yrbs2005.dat 11 2007 2007 https://ftp.cdc.gov/pub/data/yrbs/2007/yrbs2007.dat 12 2009 2009 https://ftp.cdc.gov/pub/data/yrbs/2009/yrbs2009.dat 13 2011 2011 https://ftp.cdc.gov/pub/data/yrbs/2011/yrbs2011.dat 14 2013 2013 https://ftp.cdc.gov/pub/data/yrbs/2013/yrbs2013.dat 15 2015 2015 https://www.cdc.gov/healthyyouth/data/yrbs/files/yrbs2015.dat sas_url output_filename case_count 10
https://ftp.cdc.gov/pub/data/yrbs/2005/YRBS_2005_SAS_Input_Program.sas U://YRBSS/2005 main.rds NA 11
https://ftp.cdc.gov/pub/data/yrbs/2007/YRBS_2007_SAS_Input_Program.sas U://YRBSS/2007 main.rds NA 12
https://ftp.cdc.gov/pub/data/yrbs/2009/YRBS_2009_SAS_Input_Program.sas U://YRBSS/2009 main.rds NA 13
https://ftp.cdc.gov/pub/data/yrbs/2011/YRBS_2011_SAS_Input_Program.sas U://YRBSS/2011 main.rds NA 14
https://ftp.cdc.gov/pub/data/yrbs/2013/YRBS_2013_SAS_Input_Program.sas U://YRBSS/2013 main.rds NA 15 https://ftp.cdc.gov/pub/data/yrbs/2015smy/YRBS_2015_SAS_Input_Program.sas U://YRBSS/2015 main.rds NA >


Solution

  • library(httr) set_config(config(ssl_verifypeer = 0L)) #these two lines fixes the Peer certificate error and then the code started working.