I have a dataframe like bellow, where ID
is numeric value, and comment1
and comment2
string that I am importing as a csv. But the data frame is giving result something like this bellow, where fifth comment
should be in the comment2
and the original ID
value is replaced by this. This is happening randomly for only few rows. Moreover, this problem is only occurring when I am importing my R code in Azure ML studio, in RStudio no data misplace is occurring. So what I was thinking, just delete the entire row where the first column ID
is not a numeric value. As the misplace string value is random long sentence, I can not do string matching to delete the row. And the dataframe is big enough that I just cannot delete the rows manually. Suggestion please.
ID Comment1 comment2
123 This is first comment this is second
234 third comment fourth comment
fifth comment
345 sixth comment seventh comment
You will find a sample of the dataframe here,
df <-
read.csv(
"https://docs.google.com/spreadsheets/d/171YXjzm3FsapXSkqgOSos6UGXNRcd1yxmLyvaRnCX5E/pub?output=csv"
)
df <- df[-1,]
df <- df[, 1:12]
colnames(df) <-
c(
"ID","Created","Comments","Liked_By","Disliked_By", "Recipient_Number",
"Sender","Recipients","Read_By", "Subject","Introduction","Body"
)
Subset to numeric IDs:
subset(df, grepl('^\\d+$', df$ID))
The pattern should match values of ID that start and end with digits, and only contain digits.