Search code examples

Conditional for loop in R not recognizing conditional statement?

Assume that I have the following similar data structure, where doc_id is the document identifier, text_id is the unique text/version identifier and text is a character string:

df <- cbind(doc_id=as.numeric(c(1, 1, 2, 2, 3, 4, 4, 4, 5, 6)), 
                text_id=as.numeric(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), 
                  text=as.character(c("string1", "str2ing", "3string", 
                                      "string6", "s7ring", "string8", 
                                      "string9", "string10")))

What I am attempting to do in the loop structure is do string edit-distance comparisons, but only for different versions of the same documents. In short, I want to find matching doc_ids and pair-wise compare only different versions (text_ids) of the same document.

#Results matrix
result <- matrix(ncol=10, nrow=10)

for (j in 1:length(df[,2])) {
  for (i in 1:length(df[,2])) {
#Conditional Statements
      result[i,j]<-levenshteinDist(df[j,3], df[i,3])}
    else(result[i,j]<-"Not Compared")


[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "Not Compared"
[1] "0"

The levenshteinDist() function can be found in the RecordLinkage package, but a similar function is also bundled in the utils package as adist()

My question is: why is my first conditional statement (if) being ignored, and only the else portion being returned?

Any further advice on coding or processing time efficiency gains will be greatly appreciated.


  • You're not outputting correctly. Run this version and see the comparisons happening in place. Comment out the message() once you are satisfied that everything is working correctly.

    df <- structure(c("1", "1", "2", "2", "3", "4", "4", "4", "5", "6", 
    "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "string1", 
    "str2ing", "3string", "string6", "s7ring", "string8", "string9", 
    "string10", "string1", "str2ing"), .Dim = c(10L, 3L), .Dimnames = list(
        NULL, c("doc_id", "text_id", "text")))
    result <- matrix(ncol = 10, nrow = 10)
    # nrow() and ncol() are more elegant ways of getting row/column counts.
    for(j in 1:nrow(df)) {
        for(i in 1:nrow(df)) {
            message(sprintf("comparing i=%s (%s), j=%s (%s)\n", j, df[i, 1], i, df[j, 1]))
            if(identical(df[i, 1], df[j, 1])) {
                result[i, j] <- levenshteinDist(df[j, 3], df[i, 3])
            } else {
                result[i, j] <- "Not Compared"
               # printing inside the inner for loop
            print(result[i, j])