I found a nice explanation of data validation on RPubs: https://rpubs.com/techanswers88/validate-data-in-r. I have tried to reproduce the steps in an R notebook. All the steps work perfectly until I get to the tables with the results. The example uses flextable to tabulate the errors from validating the example dataset. The flextable shows well when I knit the notebook preview:flextable in Notebook preview. However, when I knit to PDF the same flextable looks terrible: Flextable in PDF. So I went to kable/KableExtra, as I am more familiar with there packages and I style most of my tables with these. Kable/kableExtra usually works very well with both the R Notebooks and PDF. With Kable/KableExtra I get a more compact table which I prefer. However, some of the validation criteria in the table contain the pipe character (e.g. "(DBP == 0 | SBP ==0) & Outcome == 'Survived' )". The pipe character shows as '|' in the table. This specifically happens as soon as I pass the table setup through KableExtra's styling options.
In the MWE below I have left out the graphics of the original example I tried to reproduce and only focus on the troublesome table. R version used: 4.3.2; dplyr 1.1.4; Flextable 0.9.4; kableExtra 1.4.0; validate 1.1.5.
MWE:
---
title: "kableExtra vs flextable"
output:
pdf_document:
toc: true
number_sections: true
html_notebook:
toc: true
number_sections: true
---
```{r}
#Load Libraries
library(dplyr)
library(validate)
library(flextable)
library(kableExtra)
```
```{r}
#Create dataset (from https://rpubs.com/techanswers88/validate-data-in-r)
myData <- data.frame(PatientID = c('P001', 'P002', 'P003' , NA, 'P005', 'P006', 'P007', 'P008',
'P009', 'P010', 'P008', 'P11'),
Age = c(23, 12, 5, 8, 245, NA, 23, 45, 87, 121, 56, 130),
Outcome = c('Died', 'Died', NA, 'Survived', 'Survived', 'Survived', 'Survived',
'Survived', 'Survived', '?', 'Survived', 'Unknown'),
SBP = c(0, 0, 120, 80, 45, 67, 100, 130, 350, 120, 46, 120),
DBP = c(0, 0, 80, 70, 30, 40, 80, 210, NA, 80, 0, 80)
)
#Add an id field in the data.
myData <- myData%>%
dplyr::mutate(id = row_number())
```
```{r}
#Create validation rules
myrules <- validator( "Patient Id Unique" = is_unique(PatientID),
"Patient ID" = is.na(PatientID),
"Age in range" = Age >= 0 & Age <= 120,
"Outcome validity " = Outcome %in% c('Survived', 'Died'),
"SBP in range" = SBP >= 0 & SBP <= 300,
"DBP in range" = in_range (DBP, min = 0 , max = 200),
"Survived with No BP" = (DBP == 0 | SBP ==0) & Outcome == 'Survived' )
```
```{r}
#create validation results with the confront function
output <- confront(myData, myrules, key ="id")
```
```{r}
#Show errors in table
#Convert our output into a dataframe first
dout <- as.data.frame(output)
```
```{r}
#Errors for each record id
dErrors <- dout%>%
dplyr::filter(! value == TRUE)%>%
dplyr::select(id, name, expression)%>%
dplyr::arrange(id)
#Display validation results in flextable
#This part works well in a R Notebook or HTML, but not in PDF.
ft <- flextable(dErrors)%>%
theme_booktabs()%>%
merge_v(j = ~id)%>%
set_header_labels(name = "Error",
id = "Record ID",
expression = "Validation expression")
ft
```
```{r}
#Try to reproduce the flextable with kable/kableExtra
#This produces a more compact table but unfortunately without the pipe character
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, escape = TRUE) %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")
```
I have found that if I only sent the results through kable but not kableExtra
```{r}
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, escape = TRUE)
```
then the pipe symbol will show in the table . However as soon as I try to do any styling (e.g. collapse_rows() or kable_styling) then the pipe symbol will be changed to '|'. Passing "format = "latex" to kable does not solve the problem, nor does "escape = TRUE". As said, the pipe character will show in the table in PDF up until the moment I add any styling. I can't find a solution online or in the manual of kableExtra. I have tried updating R and the packages but without result.
I tried to 'fix' flextable to see if I could make that work to get a better looking table, but without success. I have tried to change the width of the flextable with ft <- fit_to_width(ft, max_width = 15, unit = "cm")
. However this does not do anything with the flextable width in the PDF and all text remains smashed together. Some advice online suggest to use 'latex_engine: xelatex' in the YAML header however this does not have any effect either.
I would prefer to use kableExtra as I am more familiar with that package. Any advice on how to show the pipe character when using kableExtra?
Thanks in advance.
This seems to work if you use the format
option of kable
:
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, format = "html") %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")
If you want to knit either to HTML or to LaTeX, you can do:
fmt <- ifelse(knitr::is_html_output(), "html", "latex")
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, format = fmt) %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")