Search code examples
rdataframedelete-rowazure-machine-learning-service

How to delete all non-numeric rows in R?


I have a dataframe like bellow, where ID is numeric value, and comment1 and comment2 string that I am importing as a csv. But the data frame is giving result something like this bellow, where fifth comment should be in the comment2 and the original ID value is replaced by this. This is happening randomly for only few rows. Moreover, this problem is only occurring when I am importing my R code in Azure ML studio, in RStudio no data misplace is occurring. So what I was thinking, just delete the entire row where the first column ID is not a numeric value. As the misplace string value is random long sentence, I can not do string matching to delete the row. And the dataframe is big enough that I just cannot delete the rows manually. Suggestion please.

  ID                 Comment1                  comment2
 123             This is first comment        this is second
 234              third comment               fourth comment
fifth comment                                                  
 345               sixth comment              seventh comment

You will find a sample of the dataframe here,

    df <-
  read.csv(
    "https://docs.google.com/spreadsheets/d/171YXjzm3FsapXSkqgOSos6UGXNRcd1yxmLyvaRnCX5E/pub?output=csv"
  )
df <- df[-1,]
df <- df[, 1:12]
colnames(df) <-
  c(
    "ID","Created","Comments","Liked_By","Disliked_By", "Recipient_Number",
    "Sender","Recipients","Read_By", "Subject","Introduction","Body"
  )

Solution

  • Subset to numeric IDs:

    subset(df, grepl('^\\d+$', df$ID))
    

    The pattern should match values of ID that start and end with digits, and only contain digits.