Search code examples
rloopsrbindlarge-data

rbind loop in R


I have data with 2 million rows and 7 columns. There are also thousands of different companies and indicators.

A sample of the data is:

company_id                                                             indicator_id     X2011
1    1000045                 AccountsPayableAndAccruedLiabilitiesCurrentAndNoncurrent   6612429
2    1000045 AccumulatedDepreciationDepletionAndAmortizationPropertyPlantAndEquipment        NA
3    1000045                                                AdjustmentForAmortization        NA
4    1000045                                                                   Assets 257236034
5    1000045                                         AssetsHeldForSaleAtCarryingValue   1373001
6    1000045                      AssetsOfDisposalGroupIncludingDiscontinuedOperation        NA
      X2012     X2013     X2014     X2015
1   7405579   8924919   7841070   5839000
2   2111343   2242703   2236449   2462000
3 -11482251 -13490892 -13852305 -13811000
4 263835468 283429579 302528591 325309000
5   1203664   1696330        NA        NA
6        NA        NA   1746887   2148000

and some of the indicators I want are:

    asset_indicators
[1] "DeferredIncomeTaxExpenseBenefit"                                         
[2] "CashAndCashEquivalentsAtCarryingValue"                                   
[3] "CashAndCashEquivalentsPeriodIncreaseDecrease"                            
[4] "NetCashProvidedByUsedInOperatingActivities"                              
[5] "NetCashProvidedByUsedInFinancingActivities"                              
[6] "NetCashProvidedByUsedInInvestingActivities"                              
[7] "PropertyPlantAndEquipmentNet"                                            

I am trying to get a new dataframe with just certain rows. Currently, I have this code:

year <-rbind(statement[which(statement$indicator_id==asset_indicators[1]),],
         statement[which(statement$indicator_id==asset_indicators[2]),],
         statement[which(statement$indicator_id==asset_indicators[3]),],
         statement[which(statement$indicator_id==asset_indicators[4]),],
         statement[which(statement$indicator_id==asset_indicators[5]),],
         statement[which(statement$indicator_id==asset_indicators[6]),],
         statement[which(statement$indicator_id==asset_indicators[7]),] )

I'm wondering if there's a way to do this more efficiently? I tried making a for loop but it took hours to run while this takes just a few seconds. It would be nice if it updated automatically if I changed the number of indicators to use.

Does anyone have any ideas how to do this?


Solution

  • Try %in%:

    year <- statement[statement$indicator_id %in% asset_indicators,]