Search code examples
rdata-analysisbigdata

Converting relative observations into numerical values


this is my first project in R, after just having learned java.

I have a (large) data set that I have imported from a csv file into data frame.

I have identified the two relevent columns for this question, the first that has the name of the patient, and second that asks the patient the level of swelling.

The level of swelling is relative i.e. better, worse or about the same.

Not all patients have the same number of observations.

I am having difficulty converting these relative values into numerical values that can be used as part of a greater analysis.

Below is psuedocode to what i think could be an appropriate solution:

for row in 'patientname'
  patientcounter = dtfr1[row, 'patientname'];
  if dtfr1[row, 'patientname'] == patientcounter
    if dtfr1[row, 'Does.you.swelling.seem.better.or.worse'] == 'better'
      conditioncounter--;
      dtfr1[row, 'Does.you.swelling.seem.better.or.worse'] = conditioncounter;
    elseif [row, 'Does.you.swelling.seem.better.or.worse'] == 'better'
      conditoncounter++;
      dtfr1[row, 'Does.you.swelling.seem.better.or.worse'] = conditioncounter;
    else
      dtfr1[row, 'Does.you.swelling.seem.better.or.worse'] = conditioncounter;
  if dtfr1[row, 'patientname'] =! patientcounter
    patientcounter = dtfr1[row, 'patientname'];  

What would your advice be for a good solution to this problem? Thanks!


Solution

  • If I'm understanding correctly, you want the difference in the counts of worse and better, by patient? If so, something like this would work.

    # Simulated data
    dtfr1 <- data.frame(patient = sample(letters[1:3], 100, replace=TRUE), 
                        condition = sample(c("better", "worse"), 100, replace=TRUE))
    head(dtfr1)
    #   patient condition
    # 1       a     worse
    # 2       b    better
    # 3       b     worse
    # 4       a    better
    # 5       c     worse
    # 6       a    better
    
    better_count <- tapply(dtfr1$condition, dtfr1$patient, function(x) sum(x == "better"))
    worse_count <- tapply(dtfr1$condition, dtfr1$patient, function(x) sum(x == "worse"))
    worse_count - better_count
    #  a  b  c 
    #  5  0 -1