I have downloaded stock prices data of London Stock Exchange from datastream consisting of 5413 companies for 15 years. When you download the data from Datastream, the companies whose data is unavailable it replaces the name of company with #ERROR in the column label.I illustrate the excel sheet
Date A B #ERROR #ERROR E F
31-12-1999 1 2 3 4
3-1-2000
So, when I import it to R, I have
Date A B X. ERROR X.ERROR.1 E F
1999-12-31 1 2 3 4
2000-1-3 NA NA NA NA
As it can be seen R changes it to X.ERROR and X.ERROR.1 and the rows are blans not even NA assigned. My dataset is huge it would be very cumbersome to apply Price$X.ERROR <- NULL
. Additionally I find it to be inefficient.
So, how can find all these X:ERROR in my dataset and then delete these columns.
I create a sample data frame as follows:
df <- data.frame(Date = 1:5,
A = 1:5,
B = 1:5,
X.ERROR = 1:5,
X.ERROR.1 = 1:5,
E = 1:5,
F = 1:5)
df
## Date A B X.ERROR X.ERROR.1 E F
## 1 1 1 1 1 1 1 1
## 2 2 2 2 2 2 2 2
## 3 3 3 3 3 3 3 3
## 4 4 4 4 4 4 4 4
## 5 5 5 5 5 5 5 5
Then I search for the names of the columns to be removed:
to_del <- grepl("X.ERROR", names(df))
This returns TRUE for all columns that have "X.ERROR"
in their name. Now you can remove them from the data frame:
df <- df[!to_del]
df
## Date A B E F
## 1 1 1 1 1 1
## 2 2 2 2 2 2
## 3 3 3 3 3 3
## 4 4 4 4 4 4
## 5 5 5 5 5 5