I am writing a utility function to do some data format conversion, and I am having trouble stating it correctly, so that it applies to the data I want it to apply to, and returns a result of the right shape.
I have a test data set called HiRawTiny, the str demonstrated below. The data in V1 is char. I have a test function called GetRank, whose job is to take all chars to the right of a ":" and coerce them to numeric. This is also demonstrated below. The list of list syntax I used in the fn to get at the output of strsplit is a bit opaque to me, and I frankly I arrived at it by trial and error, but it appears to work ok when passed single values. But when I pass it a vector (a data frame column), it doesn't give me a vector result that's the same length as the vector I passed it, but only a single value.
What should I do to sort this out? I am new to R (though I used to use S many decades ago), and suspect I've got into a syntax muddle. Is my function syntax wrong given what I am trying to do? Should I be looking at using "apply" or one of its friends, to do this? Or should the fn be able to handle vector in/vector out natively?
str(HiRawTiny)
>'data.frame': 10 obs. of 7 variables:
>$ V1: chr "RANK:1" "RANK:2" "RANK:3" "RANK:4" ...
$ V2: chr
> "SOURCEID:CWC02001632398F4C" "SOURCEID:CWC020000F0D57DD6"
> "SOURCEID:CWC0200214C29872E" "SOURCEID:CWC0200163206B9F2" ...
$ V3:
> chr "TIME:01:04:2012-22:23:58" "TIME:01:04:2012-12:07:55"
> "TIME:01:04:2012-12:39:51" "TIME:02:04:2012-07:18:25" ...
$ V4: chr
> "SCORE:3142" "SCORE:3040" "SCORE:2911" "SCORE:2882" ...
$ V5: chr
> "TIEBREAK:4923864" "TIEBREAK:5787094" "TIEBREAK:766764"
> "TIEBREAK:1872936" ...
$ V6: chr "" "" "" "" ...
$ V7: chr "" ""
> "" "" ...
GetRank function(x) {as.numeric(strsplit(x, split=":")[[1]][2]) }
GetRank(HiRawTiny[1,1]) [1] 1
GetRank(HiRawTiny[2,1]) [1] 2
GetRank(HiRawTiny[,1]) [1] 1
#"What I want is a vector of GetRank being applied to all of column 1
strsplit
returs a list
. Each element of the list
contains the divided string. You can change the list
into a matrix
with do.call
and rbind
and then select the second column,
GetRank <- function(x) {as.numeric(do.call(rbind, strsplit(x, split=":"))[, 2]) }
GetRank(HiRawTiny$V1)