Suppose I have something like the following vector:
text <- as.character(c("string1", "str2ing", "3string", "stringFOUR", "5tring", "string6", "s7ring", "string8", "string9", "string10"))
I want to execute a loop that does pair-wise comparisons of the edit distance of all possible combinations of these strings (ex: string 1 to string 2, string 1 to string 3, and so forth). The output should be in a matrix form with rows equal to number of strings and columns equal to number of strings.
I have the following code below:
#Matrix of pair-wise combinations
m <- expand.grid(text,text)
#Define number of strings
n <- c(1:10)
#Begin loop; "method='osa'" in stringdist is default
for (i in 1:10) {
n[i] <- stringdist(m[i,1], m[i,2], method="osa")
write.csv(data.frame(distance=n[i]),file="/File/Path/output.csv",append=TRUE)
print(n[i])
flush.console()
}
The stringdist() function is from the stringdist{} package but the function is also bundled in the base utils package as adist()
My question is, why is my loop not writing the results as a matrix, and how do I stop the loop from overwriting each individual distance calculation (ie: save all results in matrix form)?
I would suggest using stringdistmatrix
instead of stringdist
(especially if you are using expand.grid
)
res <- stringdistmatrix(text, text)
dimnames(res) <- list(text, text)
write.csv(res, "file.csv")
As for your concrete question: "My question is, why is my loop not writing the results as a matrix"
It is not clear why you would expect the output to be a matrix? You are calculating an element at a time, saving it to a vector and then writing that vector to disk.
Also, you should be aware that the arugments of write.csv
are mostly useless (they are there, I believe, just to remind the user of what the defaults are). Use write.table
instead
If you want to do this iteratively, I would do the following:
# Column names, outputted only one time
write.table(rbind(names(data.frame(i=1, distance=n[1])))
,file="~/Desktop/output.csv",append=FALSE # <~~ Don't append for first run.
, sep=",", col.names=FALSE, row.names=FALSE)
for (i in 1:10) {
n[[i]] <- stringdist(m[i,1], m[i,2], method="osa")
write.table(data.frame(i=i, distance=n[i]),file="~/Desktop/output.csv"
,append=TRUE, sep=",", col.names=FALSE, row.names=FALSE)
print(n[[i]])
flush.console()
}