Search code examples
rsurvey

Subsetting survey data based on variable values(levels)?


I am trying to do an analysis of the PIAAC (Programme for the International Assessment of Adult Competencies) data by using as a sample the occupation of the survey participants. The .csv file contains two sheets: one with the variables and the other one with the values the variables can take. The individual files for the different countries could be accessed through: https://webfs.oecd.org/piaac/puf-data/CSV/ or the whole dataset directly by:

library("devtools")
install_github("pbiecek/PISA2012lite") #Needed to access the PIAAC data

data("piaac", package = "PIAAC")

When I read the data in R, the variable "ISCO08_C" indicates the current occupation of the participant and in the data.frame in R it only takes the values takes the values 9996, 9997, 9998, 9999, which correspond to "Valid Skip", "Don't know", "Refused" and "Not stated or inferred", respectively (these can be found in the codebook provided in the OECD PIAAC website: https://www.oecd.org/skills/piaac/data/ ). In the .csv file in the values sheet there are also the codes for every occupation and my aim is to get a subset with participants who are associated with a specific ISCO08 classification code. In my case the code is "233". This is how the first 20 entries look like:

head(aut$ISCO08_C, 20)
 [1] 9996         9996         9996         9999         9996         9999         9999         9996         9999         9999        
[11] 9999         9999         9999         9999         9999         9999         9999         9996         9999         9999        
605 Levels:              9996         9999         0110         0210         0310         1111         1113         1114         ... 33

I see that there are 605 levels and maybe the solution to my problem lies somewhere there but I am not sure how to proceed. I have tried doing:

aut = subset(piaac, CNTRYID == "Austria")

teachers = aut[aut$ISCO08_C %in% "233"]

but considering that the variable takes the values from 9996 to 9999 it does not do much apart from giving me 0 variables which is understandable. Below I provide the structure and the ASCII representation of the variable of interest.

> str(piaac$ISCO08_C)
 Factor w/ 605 levels "            ",..: 2 2 2 3 2 3 3 2 3 3 ...
> dput(head(piaac$ISCO08_C))
structure(c(2L, 2L, 2L, 3L, 2L, 3L), .Label = c("            ", 
"9996        ", "9999        ", "0110        ", "0210        ", 
"0310        ", "1111        ", "1113        ", "1114        ", 
"1120        ", "1211        ", "1212        ", "1213        ", 
"1219        ", "1221        ", "1222        ", "1223        ", 
"1321        ", "1323        ", "1324        ", "1330        ", 
"1342        ", "1343        ", "1344        ", "1345        ", 
"1346        ", "1349        ", "1411        ", "1412        ", 
"1420        ", "1431        ", "1439        ", "2113        ", 
"2120        ", "2131        ", "2133        ", "2141        ", 
"2144        ", "2145        ", "2149        ", "2151        ", 
"2152        ", "2153        ", "2161        ", "2162        ", 
"2163        ", "2165        ", "2166        ", "2211        ", 
"2212        ", "2221        ", "2222        ", "2230        ", 
"2240        ", "2250        ", "2261        ", "2262        ", 
"2263        ", "2264        ", "2265        ", "2266        ", 
"2267        ", "2269        ", "2310        ", "2320        ", 
"2330        ", "2341        ", "2342        ", "2351        ", 
"2352        ", "2354        ", "2355        ", "2356        ", 
"2359        ", "2411        ", "2412        ", "2413        ", 
"2421        ", "2422        ", "2423        ", "2424        ", 
"2431        ", "2432        ", "2433        ", "2511        ", 
"2512        ", "2513        ", "2514        ", "2519        ", 
"2521        ", "2522        ", "2611        ", "2612        ", 
"2619        ", "2621        ", "2622        ", "2632        ", 
"2634        ", "2635        ", "2636        ", "2641        ", 
"2642        ", "2643        ", "2651        ", "2652        ", 
"2654        ", "2655        ", "2659        ", "3111        ", 
"3112        ", "3113        ", "3114        ", "3115        ", 
"3116        ", "3118        ", "3119        ", "3121        ", 
"3122        ", "3123        ", "3133        ", "3135        ", 
"3139        ", "3141        ", "3152        ", "3153        ", 
"3154        ", "3211        ", "3212        ", "3213        ", 
"3221        ", "3240        ", "3251        ", "3252        ", 
"3253        ", "3254        ", "3256        ", "3257        ", 
"3258        ", "3259        ", "3311        ", "3312        ", 
"3313        ", "3314        ", "3315        ", "3321        ", 
"3322        ", "3323        ", "3331        ", "3332        ", 
"3333        ", "3334        ", "3339        ", "3341        ", 
"3342        ", "3343        ", "3344        ", "3351        ", 
"3352        ", "3353        ", "3354        ", "3355        ", 
"3359        ", "3411        ", "3412        ", "3413        ", 
"3422        ", "3423        ", "3431        ", "3432        ", 
"3434        ", "3435        ", "3511        ", "3512        ", 
"3513        ", "3514        ", "4110        ", "4120        ", 
"4132        ", "4211        ", "4221        ", "4222        ", 
"4223        ", "4224        ", "4225        ", "4226        ", 
"4229        ", "4311        ", "4312        ", "4313        ", 
"4321        ", "4322        ", "4323        ", "4411        ", 
"4412        ", "4413        ", "4416        ", "4419        ", 
"5111        ", "5112        ", "5113        ", "5120        ", 
"5131        ", "5132        ", "5141        ", "5142        ", 
"5151        ", "5152        ", "5153        ", "5163        ", 
"5164        ", "5165        ", "5211        ", "5212        ", 
"5221        ", "5222        ", "5223        ", "5230        ", 
"5242        ", "5244        ", "5246        ", "5249        ", 
"5311        ", "5312        ", "5321        ", "5322        ", 
"5329        ", "5411        ", "5412        ", "5413        ", 
"5414        ", "5419        ", "6111        ", "6112        ", 
"6113        ", "6121        ", "6130        ", "7111        ", 
"7112        ", "7114        ", "7115        ", "7119        ", 
"7121        ", "7122        ", "7123        ", "7124        ", 
"7125        ", "7126        ", "7127        ", "7131        ", 
"7132        ", "7133        ", "7211        ", "7212        ", 
"7214        ", "7215        ", "7221        ", "7222        ", 
"7223        ", "7231        ", "7232        ", "7233        ", 
"7234        ", "7313        ", "7315        ", "7316        ", 
"7318        ", "7319        ", "7321        ", "7322        ", 
"7411        ", "7412        ", "7413        ", "7421        ", 
"7422        ", "7511        ", "7512        ", "7514        ", 
"7521        ", "7522        ", "7532        ", "7533        ", 
"7534        ", "7543        ", "8114        ", "8121        ", 
"8122        ", "8131        ", "8132        ", "8142        ", 
"8143        ", "8152        ", "8153        ", "8157        ", 
"8159        ", "8160        ", "8171        ", "8172        ", 
"8181        ", "8183        ", "8189        ", "8211        ", 
"8212        ", "8219        ", "8311        ", "8312        ", 
"8322        ", "8331        ", "8332        ", "8341        ", 
"8342        ", "8343        ", "8344        ", "9111        ", 
"9112        ", "9121        ", "9123        ", "9129        ", 
"9211        ", "9212        ", "9213        ", "9214        ", 
"9312        ", "9313        ", "9321        ", "9329        ", 
"9331        ", "9333        ", "9334        ", "9412        ", 
"9510        ", "9520        ", "9611        ", "9612        ", 
"9622        ", "9623        ", "9629        ", "9997        ", 
"1311        ", "1312        ", "1322        ", "1341        ", 
"21          ", "2111        ", "2132        ", "2142        ", 
"2146        ", "2164        ", "226         ", "23          ", 
"2353        ", "2434        ", "2631        ", "2656        ", 
"31          ", "3117        ", "3131        ", "3132        ", 
"3142        ", "3143        ", "3151        ", "3214        ", 
"3230        ", "3255        ", "3421        ", "3433        ", 
"35          ", "3521        ", "3522        ", "4131        ", 
"4212        ", "4213        ", "4227        ", "4415        ", 
"5169        ", "5245        ", "6123        ", "6129        ", 
"6210        ", "7213        ", "7224        ", "723         ", 
"7311        ", "7314        ", "7515        ", "7523        ", 
"7531        ", "7536        ", "7544        ", "7549        ", 
"81          ", "8111        ", "8141        ", "8151        ", 
"8155        ", "8182        ", "8350        ", "9122        ", 
"9215        ", "9311        ", "9411        ", "9613        ", 
"9621        ", "9998        ", "011         ", "021         ", 
"031         ", "1112        ", "112         ", "122         ", 
"132         ", "133         ", "142         ", "210         ", 
"211         ", "2114        ", "212         ", "214         ", 
"2143        ", "221         ", "223         ", "224         ", 
"225         ", "231         ", "232         ", "233         ", 
"234         ", "2343        ", "235         ", "2357        ", 
"241         ", "242         ", "243         ", "2443        ", 
"25          ", "251         ", "2523        ", "2529        ", 
"264         ", "2653        ", "311         ", "313         ", 
"3134        ", "314         ", "321         ", "324         ", 
"325         ", "332         ", "3324        ", "333         ", 
"334         ", "341         ", "343         ", "351         ", 
"352         ", "4           ", "41          ", "411         ", 
"412         ", "422         ", "431         ", "441         ", 
"51          ", "511         ", "512         ", "52          ", 
"522         ", "523         ", "5232        ", "53          ", 
"531         ", "532         ", "541         ", "6114        ", 
"6122        ", "613         ", "621         ", "6222        ", 
"6223        ", "631         ", "6320        ", "713         ", 
"722         ", "731         ", "7317        ", "7323        ", 
"74          ", "741         ", "75          ", "751         ", 
"7513        ", "8113        ", "813         ", "816         ", 
"82          ", "831         ", "832         ", "833         ", 
"834         ", "835         ", "91          ", "911         ", 
"921         ", "9216        ", "93          ", "951         ", 
"952         ", "961         ", "962         ", " 210        ", 
" 211        ", " 212        ", " 310        ", "2633        ", 
"3155        ", "4214        ", "4712        ", "6221        ", 
"6224        ", "6310        ", "6340        ", "7113        ", 
"7312        ", "7541        ", "7542        ", "8156        ", 
"8321        ", "3222        ", "5162        ", "7535        ", 
"121         ", "131         ", "141         ", "215         ", 
"261         ", "263         ", "331         ", "342         ", 
"421         ", "513         ", "514         ", "5243        ", 
"622         ", "753         ", "8112        ", "812         ", 
"815         ", "817         ", "912         ", "932         ", 
"933         ", "0           ", "2           ", "3           ", 
"5241        ", "611         ", "612         ", "7           ", 
"71          ", "711         ", "712         ", "72          ", 
"752         ", "8           ", "811         ", "814         ", 
"8154        ", "821         ", "83          ", "9           ", 
"931         ", "134         ", "42          ", "432         ", 
"4414        ", "61          ", "732         ", "742         ", 
"941         ", "5161        ", "6330        ", "111         ", 
"143         ", "213         ", "216         ", "222         ", 
"252         ", "262         ", "265         ", "312         ", 
"315         ", "322         ", "323         ", "335         ", 
"515         ", "516         ", "524         ", "721         ", 
"754         ", "818         ", "12          ", "632         ", 
"633         ", "2112        ", "24          ", "33          "
), class = "factor")

I hope this information is sufficient to understand the problem. Any help would be highly appreciated. Thank you in advance.


Solution

  • I think the answer here is quite simple. The piacc data has the column ISO08_C which is of type factor. It contains entries with all the factor levels you can see in your dput. However, in the subset for Austria, only the numbers 9996 and 9999 are present.

    When you subset a data frame with a factor column, the factor levels are all preserved even if some of the factor levels are not present in the subset.

    For example:

    df_with_factors <- data.frame(group = as.factor(rep(LETTERS[1:5], each = 2)))
    subsetted_df    <- subset(df_with_factors, group != "A")
    
    subsetted_df
    #>    group
    #> 3      B
    #> 4      B
    #> 5      C
    #> 6      C
    #> 7      D
    #> 8      D
    #> 9      E
    #> 10     E
    

    You can see that there are no "A" entries left in subsetted_df$group. Yet if I check the factor levels:

    levels(subsetted_df$group)
    #> [1] "A" "B" "C" "D" "E"
    

    You can see that "A" still exists as a factor level, even though it is not used in this subset.

    You can confirm that this is the case with your data if you use the link you provided to download just the Austria data as csv.

    df <- read.csv("https://webfs.oecd.org/piaac/puf-data/CSV/prgautp1.csv")
    

    Now if I tabulate the different values of ISO08_C from just the Austria data, I can see this:

    table(df$ISCO08_C)
    
    #> 9996 9999 
    #> 1392 3692
    

    So the short answer to your problem is that this particular question was not answered in the Austrian data set, and you are therefore trying to analyse data that does not exist in the data set you have.