Search code examples
rdataframevectordplyroutput

Capture Output From Function, Convert to Dataframe


I have had a very hard time capturing output from a function. The function and data themselves aren't important; I need to get the printed data. Say I run this function and get this output


> sim_distr(10, dft, plot_flag = FALSE)
 
P-values for each variable ('V') in each assessed trial
          V1   V2   V3   V4   V5   V6   V7  V8   V9 V10  V11
19818588 0.5 0.60 0.00 0.85 0.00 0.00 0.05 0.2 0.80  NA   NA
19825849 0.0 0.25 0.60 0.00 0.10 0.10   NA  NA   NA  NA   NA
19851772 0.2 0.55 0.10 0.10 0.15 0.30 1.00 0.0 0.75 0.8 0.25
19854713 0.9 0.85 0.75 0.90 0.40 0.25   NA  NA   NA  NA   NA
19902267 0.0 0.50 0.40 0.20 0.05 0.10   NA  NA   NA  NA   NA
 
Combined (overall) p-values for each assessed trial
           p-value
19818588 0.0000000
19825849 0.0000000
19851772       NaN
19854713 0.9140425
19902267 0.0000000

I want to capture all that data, INCLUDING the p-values for each variable. Unfortunately, the function isn't saving the big table of values, it's just printing them:


> output <- sim_distr(10, dft, plot_flag = FALSE)
 
P-values for each variable ('V') in each assessed trial
           V1   V2   V3  V4   V5   V6  V7  V8   V9 V10 V11
19818588 0.40 0.50 0.20 1.0 0.10 0.10 0.1 0.4 0.80  NA  NA
19825849 0.00 0.25 0.65 0.0 0.20 0.10  NA  NA   NA  NA  NA
19851772 0.30 0.80 0.20 0.2 0.25 0.15 0.8 0.0 0.75 0.8 0.5
19854713 0.90 0.70 0.80 0.8 0.60 0.30  NA  NA   NA  NA  NA
19902267 0.05 0.20 0.60 0.2 0.10 0.40  NA  NA   NA  NA  NA
 
Combined (overall) p-values for each assessed trial
           p-value
19818588 1.0000000
19825849 0.0000000
19851772 0.0000000
19854713 0.9055433
19902267 0.0299261

> output
           p-value
19818588 1.0000000
19825849 0.0000000
19851772 0.0000000
19854713 0.9055433
19902267 0.0299261

The only thing I have gotten to work so far is by wrapping it in capture.output(), but then it just gives me a huge mess of strings:


> capture.output(sim_distr(10, dft, plot_flag = FALSE))
 [1] " "                                                           
 [2] "P-values for each variable ('V') in each assessed trial"     
 [3] "          V1   V2   V3   V4   V5   V6   V7  V8   V9 V10  V11"
 [4] "19818588 0.2 0.50 0.00 0.75 0.10 0.00 0.20 0.4 0.60  NA   NA"
 [5] "19825849 0.0 0.20 0.40 0.00 0.35 0.10   NA  NA   NA  NA   NA"
 [6] "19851772 0.1 0.55 0.05 0.00 0.50 0.20 0.75 0.1 0.55 0.5 0.45"
 [7] "19854713 0.8 1.00 0.70 1.00 0.50 0.15   NA  NA   NA  NA   NA"
 [8] "19902267 0.1 0.60 0.60 0.10 0.20 0.20   NA  NA   NA  NA   NA"
 [9] " "                                                           
[10] "Combined (overall) p-values for each assessed trial"         
[11] "            p-value"                                         
[12] "19818588 0.00000000"                                         
[13] "19825849 0.00000000"                                         
[14] "19851772 0.00000000"                                         
[15] "19854713 1.00000000"                                         
[16] "19902267 0.06341703"  

Is there some way to do this that does not involve recoding millions of strings? Acutal dataset is >20,000

Response to @akrun:

> out <- capture.output(sim_distr(10, dft, plot_flag = FALSE)); dput(head(out, 16))
c(" ", "P-values for each variable ('V') in each assessed trial", 
"           V1   V2   V3   V4   V5   V6  V7   V8  V9 V10 V11", 
"19818588 0.30 0.40 0.20 0.80 0.10 0.00 0.0 0.40 0.5  NA  NA", 
"19825849 0.00 0.10 0.50 0.00 0.20 0.10  NA   NA  NA  NA  NA", 
"19851772 0.20 0.55 0.15 0.00 0.00 0.30 0.8 0.45 0.6 0.6 0.3", 
"19854713 0.85 0.90 0.70 1.00 0.70 0.05  NA   NA  NA  NA  NA", 
"19902267 0.05 0.35 0.70 0.35 0.05 0.40  NA   NA  NA  NA  NA", 
" ", "Combined (overall) p-values for each assessed trial", "            p-value", 
"19818588 0.00000000", "19825849 0.00000000", "19851772 0.00000000", 
"19854713 1.00000000", "19902267 0.06093486")
> 

Solution

  • An option would be

    out <- capture.output(sim_distr(10, dft, plot_flag = FALSE))
    out <- trimws(out)
    i1 <- out == ""
    dat <- do.call(cbind, unname(lapply(split(out[!i1], 
       cumsum(i1)[!i1]), \(x) read.table(text = x[-1]))))
    

    -output

    > dat
              V1   V2   V3   V4   V5   V6   V7  V8   V9 V10  V11   p.value
    19818588 0.5 0.60 0.00 0.85 0.00 0.00 0.05 0.2 0.80  NA   NA 0.0000000
    19825849 0.0 0.25 0.60 0.00 0.10 0.10   NA  NA   NA  NA   NA 0.0000000
    19851772 0.2 0.55 0.10 0.10 0.15 0.30 1.00 0.0 0.75 0.8 0.25       NaN
    19854713 0.9 0.85 0.75 0.90 0.40 0.25   NA  NA   NA  NA   NA 0.9140425
    19902267 0.0 0.50 0.40 0.20 0.05 0.10   NA  NA   NA  NA   NA 0.0000000
    > str(dat)
    'data.frame':   5 obs. of  12 variables:
     $ V1     : num  0.5 0 0.2 0.9 0
     $ V2     : num  0.6 0.25 0.55 0.85 0.5
     $ V3     : num  0 0.6 0.1 0.75 0.4
     $ V4     : num  0.85 0 0.1 0.9 0.2
     $ V5     : num  0 0.1 0.15 0.4 0.05
     $ V6     : num  0 0.1 0.3 0.25 0.1
     $ V7     : num  0.05 NA 1 NA NA
     $ V8     : num  0.2 NA 0 NA NA
     $ V9     : num  0.8 NA 0.75 NA NA
     $ V10    : num  NA NA 0.8 NA NA
     $ V11    : num  NA NA 0.25 NA NA
     $ p.value: num  0 0 NaN 0.914 0