r dataframe multiple-columns subtraction

How to subtract two columns in same dataframe with same columns headers/name in R with missing data

I have a dataframe with 8000 columns and 3785 rows. Columns are company names,so,I want to calculate to spread of a company by subtrating its ask price from bid price i.e Spread of company/stock = A.ASK- A.BID. In the column name A.ASK indicates that A is name of company and .ASK indicated it is the ASK price of the company where .BID is the BID price of A . And in my dataframe all ask and bid price of company are side by side column just as illustrated in example dataframe below. Additionally I have missing data for instance if company C started trading in year 2001 it will have NA for year 2000. So, I want is not ignore the date column for calculation and additionally where the there are NA for ask and bid of company it return me NA in resultant column

Date        A . ASK A .BID  C. ASK  C. BID
31/12/1999  NA      NA      NA      NA
03/01/2000  NA      NA      NA      NA
04/01/2000  82      77      NA      NA 
05/01/2000  82      77      NA      NA
06/01/2000  82      77      NA      NA
07/01/2000  82      77      NA      NA
10/01/2000  82      77      NA      NA
11/01/2000  82      77      NA      NA
12/01/2000  NA      NA      NA      NA
13/01/2000  NA      NA      NA      NA
14/01/2000  NA      NA      70      67
17/01/2000  NA      NA      70      67
18/01/2000  97      94      70      67
19/01/2000  97      92      70      67

df2<-df1

Date        A   C
31/12/1999  NA  NA
03/01/2000  NA  NA
04/01/2000  5   NA
05/01/2000  5   NA
06/01/2000  5   NA
07/01/2000  5   NA
10/01/2000  5   NA
11/01/2000  5   NA
12/01/2000  NA  NA
13/01/2000  NA  NA
14/01/2000  NA  3
17/01/2000  NA  3
18/01/2000  3   3
19/01/2000  5   3

Your help is highly appreciated

Solution

Having nicely formatted data which alternates between ask and bid price for each company makes this relatively straightforward. The following code should do what you're looking for.

# Import data
df <- read.table(text = 
                 "Date       A.ASK   A.BID   C.ASK   C.BID
                 31/12/1999  NA      NA      NA      NA
                 03/01/2000  NA      NA      NA      NA
                 04/01/2000  82      77      NA      NA 
                 05/01/2000  82      77      NA      NA
                 06/01/2000  82      77      NA      NA
                 07/01/2000  82      77      NA      NA
                 10/01/2000  82      77      NA      NA
                 11/01/2000  82      77      NA      NA
                 12/01/2000  NA      NA      NA      NA
                 13/01/2000  NA      NA      NA      NA
                 14/01/2000  NA      NA      70      67
                 17/01/2000  NA      NA      70      67
                 18/01/2000  97      94      70      67
                 19/01/2000  97      92      70      67",
                 header = TRUE
)

# Define a sequence which selects every second column
# ask_cols starts at column 2
# bid_cols starts at column 3
ask_cols <- (1:((ncol(df)-1)/2))*2
bid_cols <- (1:((ncol(df)-1)/2))*2+1

# Use ask_cols and bid_cols to select columns from df and calculate
df2 <- df[, ask_cols]-df[, bid_cols]

# Add the date column to df2
df2 <- cbind(df[, 1], df2)

# We will use stringr for extracting company names to define column names
library(stringr)

colnames(df2) <- c("Date", str_extract(colnames(df[, ask_cols]), "([A-Za-z]+)"))

Giving

> df2
         Date  A  C
1  31/12/1999 NA NA
2  03/01/2000 NA NA
3  04/01/2000  5 NA
4  05/01/2000  5 NA
5  06/01/2000  5 NA
6  07/01/2000  5 NA
7  10/01/2000  5 NA
8  11/01/2000  5 NA
9  12/01/2000 NA NA
10 13/01/2000 NA NA
11 14/01/2000 NA  3
12 17/01/2000 NA  3
13 18/01/2000  3  3
14 19/01/2000  5  3

Edit: A better way to define ask_cols and bid_cols is using the seq function

ask_cols <- seq(2, ncol(df), 2)
bid_cols <- seq(3, ncol(df), 2)

Edit 2: A better regular expression for matching company names is to use a lookahead to match any series of characters which is followed by .ASK.

colnames(df2) <- c("Date", str_extract(colnames(df[, ask_cols]), ".*(?=\\.ASK)"))