The objective is to read from a CSV file and write the contents in a Markdown table format.
The application uses the R engine Renjin, which does not support knitr, kable, or pandoc.
The write.table
command has an eol
option, but no corresponding sol
option. Thus for the following:
f <- read.csv('planning.csv')
write.table(
format(f, digits=2), "",
sep="|", row.names=F, col.names=F, quote=F, eol="|\n")
The output appears as follows:
Geothermal|1250.0|Electricity|0.0|
Houses| 13.7|Shelter|4.2|
Compostor| 1.2|Recycling|0.2|
But each line should appear with a |
prefix, as follows:
|Geothermal|1250.0|Electricity|0.0|
|Houses| 13.7|Shelter|4.2|
|Compostor| 1.2|Recycling|0.2|
It should be possible to do something like (note the extra eol
pipe):
write.table(
format(f, digits=2), "",
sep="|", row.names=F, col.names=F, quote=F, eol="|\n|")
Then capture everything as a string, concatenate a leading pipe, and finally trim the extraneous ending pipe. That is, fix the problems with the output that'll resemble:
Geothermal|1250.0|Electricity|0.0|
|Houses| 13.7|Shelter|4.2|
|Compostor| 1.2|Recycling|0.2|
|Fire Station| -9.6|Protection|0.5|
|Roads| 0.0|Transport|0.9|
|
Such string manipulation doesn't seem very R-like, though.
What is the most efficient way to transmogrify a CSV file into a Markdown format without relying on third-party libraries?
The Markdown flavour in question looks like:
|Header|Header|Header|
|---|---|---|
|Data|Data|Data|
|Data|Data|Data|
Hints for how to write only the header data and the table header separator are also welcome.
Since you want to put it into markdown, I think it's safe to say that the table size is manageable, so performance is not a factor. (Edit #3: I had some small bugs relating to presence of row names, so to simplify things I'm going to remove them completely from the sample data.)
mtcars$rowname <- rownames(mtcars)
rownames(mtcars) <- NULL
mtcars <- mtcars[,c(ncol(mtcars), 1:(ncol(mtcars)-1))]
head(mtcars)
# rowname mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Now the work:
dashes <- paste(rep("---", ncol(mtcars)), collapse = "|")
txt <- capture.output(
write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE)
)
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|---|---|---|---|---|---|---|---|---|---|---|---|"
# [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|"
# [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|"
# [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|"
# [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"
If you are concerned about alignment, you can check for character
s (and perhaps others, over to you). This uses the alignment row of the markdown table format:
(ischar <- vapply(mtcars, is.character, logical(1)))
# rowname mpg cyl disp hp drat wt qsec vs am gear carb
# TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"
# [3] "|Mazda RX4|21|6|160|110|3.9|2.62|16.46|0|1|4|4|"
# [4] "|Mazda RX4 Wag|21|6|160|110|3.9|2.875|17.02|0|1|4|4|"
# [5] "|Datsun 710|22.8|4|108|93|3.85|2.32|18.61|1|1|4|1|"
# [6] "|Hornet 4 Drive|21.4|6|258|110|3.08|3.215|19.44|1|0|3|1|"
And when you're finally ready to save, use cat(txt2, file = "sometable.md")
(or writeLines
).
Edit #1: note that the other suggested answers (including mine above) do not address pipe symbols within the content:
mtcars$mpg[1] <- "2|1.0"
ischar <- vapply(mtcars, is.character, logical(1))
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"
# [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"
### ^ this is the problem
You can escape it manually on all character (or add in factors, too) columns:
ischar <- vapply(mtcars, is.character, logical(1))
mtcars[ischar] <- lapply(mtcars[ischar], function(x) gsub("\\|", "|", x))
dashes <- paste(ifelse(ischar, ":--", "--:"), collapse = "|")
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], dashes, txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"
# [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"
### ^^^^^^ this is the pipe, interpreted correctly in markdown
This doesn't work well when the pipe is within a code block, though a workaround was suggested here: https://stackoverflow.com/a/17320389/3358272
At this point, as @alistaire suggested, you're somewhat reimplementing knitr::kable
. For that matter, just grab knitr/R/table.R
) and use kable_markdown
which does the pipe-escaping for you. It takes a character matrix
, not a data.frame
, so kable_markdown(as.matrix(mtcars))
. You can't just grab the single function as it uses several helper functions also in that file. You can certainly prune some functions, including kable
itself which requires functions in other files.
Edit #2: since you said renjin doesn't support *apply
functions (a comment suggests that is incorrect, but I'll continue for the sake of argument), here's a for
-loop implementation that includes alignment and |
-escaping:
mtcars$mpg[1] <- "2|1.0" # just a reminder that it's here
dashes <- rep("--:", length(mtcars))
for (i in seq_along(mtcars)) {
if (is.character(mtcars[[i]]) || is.factor(mtcars[[i]])) {
mtcars[[i]] <- gsub("\\|", "|", mtcars[[i]])
dashes[i] <- ":--"
}
}
txt <- capture.output(write.table(mtcars, stdout(), quote = FALSE, sep = "|", row.names = FALSE))
txt2 <- sprintf("|%s|", c(txt[1], paste(dashes, collapse = "|"), txt[-1]))
head(txt2, n = 3)
# [1] "|rowname|mpg|cyl|disp|hp|drat|wt|qsec|vs|am|gear|carb|"
# [2] "|:--|:--|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|"
# [3] "|Mazda RX4|2|1.0|6|160|110|3.9|2.62|16.46|0|1|4|4|"
For the record, my *apply
and for
-loop implementations are effectively the same performance, while @alistaire's solution is over twice as fast (with mtcars
):
Unit: microseconds
expr min lq mean median uq max neval
apply_noalign 917.881 947.9665 1031.9288 971.3060 1041.5050 1999.499 100
apply_align 945.960 975.1350 1083.2856 995.7390 1063.7500 3523.101 100
apply_align_pipes 1110.429 1148.5360 1255.5460 1176.9815 1275.2600 1905.778 100
forloop 1188.104 1217.0950 1309.2549 1261.2205 1342.3600 2979.010 100
alistaire 451.830 473.7105 511.5778 496.1370 518.5645 827.443 100
alistaire_pipes 593.687 626.6900 718.6898 652.7645 700.5360 5460.970 100
I used his original function for alistaire
and added a simple gsub
for alistaire_pipes
. There may be a more efficient way to do it, but (a) simple/straight-forward is good, and (b) I think your tables will be small enough where true performance will not be a driving force.