Hi I have an 8GB file which I need to do some analysis. However my RAM is not that great. To efficiently work, I decided to split my csv file based on rows with following code:
library(tidyverse)
sample_df <- readr::read_csv("sample.csv") #Read in the csv file
dput(sample_df)
#break the large CSV so RAM and Rstudio doesn't crash
groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached
for (i in seq_along(groups)) {
write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
}
This worked perfectly until my senior mentor asked me to do analysis based on each date/days. I ran into a problem, because by splitting by rows, I ended up spreading the dates to multiple csvs. And this creates a problem of low RAM and memory management when I try to read 3-4 csvs to do analysis based on each day.
Sample file is here: https://github.com/THsTestingGround/SO_splitbydate_question/blob/master/sample.csv
So could someone please assist me how do split following sample csv file which I read in initailly, based on date? I wanted all the Aprl1 together in one csv file, then Aprl2 into another and so on. I did made an attempt, but I couldn't succeed.
Also I was wondering if readr::read_csv_chunked
can help us in any ways? From the documentation I couldn't see anything specific.
here is dput
of the csv file:
dput(sample_df)
structure(list(createdAt = c("Fri Apr 01 04:04:32 +0000 2020",
"Fri Apr 01 04:04:36 +0000 2020", "Fri Apr 01 04:04:37 +0000 2020",
"Fri Apr 02 04:04:40 +0000 2020", "Fri Apr 02 04:04:44 +0000 2020",
"Fri Apr 02 04:04:46 +0000 2020", "Fri Apr 02 04:04:54 +0000 2020",
"Fri Apr 02 04:04:56 +0000 2020", "Fri Apr 02 04:05:07 +0000 2020",
"Fri Apr 02 04:05:12 +0000 2020", "Fri Apr 03 04:05:12 +0000 2020",
"Fri Apr 03 04:05:19 +0000 2020", "Fri Apr 03 04:05:27 +0000 2020",
"Fri Apr 03 04:05:33 +0000 2020", "Fri Apr 03 04:05:36 +0000 2020",
"Fri Apr 03 04:06:11 +0000 2020", "Fri Apr 03 04:07:08 +0000 2020",
"Fri Apr 03 04:07:14 +0000 2020", "Fri Apr 03 04:07:15 +0000 2020",
"Fri Apr 03 04:07:20 +0000 2020", "Fri Apr 03 04:07:30 +0000 2020",
"Fri Apr 03 04:07:51 +0000 2020", "Fri Apr 03 04:08:04 +0000 2020",
"Fri Apr 03 04:08:09 +0000 2020", "Fri Apr 03 04:08:15 +0000 2020",
"Fri Apr 03 04:08:22 +0000 2020", "Fri Apr 03 04:08:36 +0000 2020",
"Fri Apr 03 04:08:46 +0000 2020", "Fri Apr 03 04:08:46 +0000 2020",
"Fri Apr 03 04:09:01 +0000 2020", "Fri Apr 03 04:09:08 +0000 2020",
"Fri Apr 03 04:09:10 +0000 2020", "Fri Apr 03 04:09:15 +0000 2020",
"Fri Apr 03 04:09:26 +0000 2020", "Fri Apr 03 04:09:27 +0000 2020",
"Fri Apr 03 04:09:28 +0000 2020", "Fri Apr 03 04:09:28 +0000 2020",
"Fri Apr 03 04:09:35 +0000 2020", "Fri Apr 03 04:09:36 +0000 2020",
"Fri Apr 03 04:09:41 +0000 2020", "Fri Apr 03 04:09:45 +0000 2020",
"Fri Apr 03 04:10:16 +0000 2020", "Fri Apr 03 04:10:19 +0000 2020",
"Fri Apr 03 04:10:22 +0000 2020", "Fri Apr 03 04:10:26 +0000 2020",
"Fri Apr 03 04:10:31 +0000 2020", "Fri Apr 03 04:10:48 +0000 2020",
"Fri Apr 04 04:11:19 +0000 2020", "Fri Apr 04 04:11:32 +0000 2020",
"Fri Apr 04:11:44 +0000 2020"), timestamp = c(1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12, 1.58589e+12,
1.58589e+12, 1.58589e+12, 1.58589e+12), id_str = c(1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.25e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18,
1.24593e+18, 1.24593e+18, 1.24593e+18, 1.24593e+18), text = c("Finally. Make your own mask. Protect yourself and others. #coronavirus",
"@ArvinderSoin do you feel the use of only masks for IPD rounds, in an environment where no patients have been teste…",
"India, you actually deserve him for electing him.\n\nAb batti bhujao aur #corona bhagav.\n\nNo testing kits, no masks,…",
"great picture to sum up everything\n#mask #maskefficiency #noclothmask #maskprotection #surgicalmask #N95 #FFP1…",
"The greatest hazard to public health is official misinformation.\n\nAsian countries were wearing masks from the begin…",
"#Florida official says @3M is selling face masks to foreign countries instead of his state amid #COVID19 crisis.\n",
"Wearing masks is one of the protective measures preventing catching the novel #Coronavirus as the pandemic spreads…",
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…",
"#coronavirus watching me put on the same surgical mask 2 shifts in a row\n\n#COVID<U+30FC>19 #nurse",
"Back in stock! NIOSH N95, go to our website.\nOnly 11,000 masks \n\n#facemask #facemasks #N95…",
"Hence the vital importance of wearing masks when outside - #coronavirus #coronavirusindia #COVID2019india…",
"@Read5000YrLeap @SenSchumer buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M… ",
"When going out for essential activities, members of the public should wear reusable, non-medical cloth face coverin…",
"@jmcmaccarr buy trump facemasks. support trump 2020 and be safe. ships from midwest. #Boycott3M @seanhannity…",
"It took Americans two and a half months to start wearing masks. Think about why, maybe it could explain why the peo…",
"@CNN Just #WearMask People wearing a mask Nationwide ... SAVES…",
"That is less than 4 million per week. In Taiwan, everyone is allocated 3 surgical masks per week. For Australia t…",
"@Constitution999 @ChuckCallesto @realDonaldTrump buy trump facemasks. support trump 2020 and be safe. ships from mi…",
"Regard the debate of face mask in general public, the evidence of effectiveness is quite clear #Covid19…",
"Normalize putting on of masks. #COVID19 came to change the world order.",
"@TwitterSafety the Honduran gov’t is lying on Twitter. Saying that they are making thousands of masks, protective v…",
"Trump explaining that if you need a mask you can go to Walmart. Also that Costco has some great deals on caskets an…",
"When lockdown is over... I just may add this to my “don’t forget..” along with my wallet, gloves, mask, hand saniti…",
"Make your own mask: #covid19\n", "Please, everyone should wear a mask in public. Use whatever you can get hold of. Something is better than nothing (…",
"@kittywuv1 So incredibly mesmerizing, even with the custom #covid19 mask!<U+0001F970><U+0001F60D><U+0001F618><U+0001F637><U+0001F497>",
"@BeauTFC Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (surgi…",
"On a lighter note. \n\nIt is questionable if these common surgical masks and cloth masks will protect us from…",
"Medical workers face big mask shortage. This UF doctor came up with way to make many \n\n…",
"Homemade face coverings. Well, I tried it didn't come out straight but it should work. <U+0001F637> #homemade #facecoverings…",
"#covid19 In Africa, \"where are no masks, no treatment, no reanimation\", \"the same way experimental treatment for AI…",
"@theblondeMD Happy to report that we’ve developed a 3-D printed mask. Passed N95 equivalent fit-test with Bitrex (s…",
"I wouldn’t do a thing anyone from #China says to do. The masks they keep sending around the world are faulty, they…",
"@TIME [covid19],important:\n1.from_air->mask->mask_reuse.\n2.from_touch->clean_hands.\n\nps1.20200328.…",
"@3M stop selling masks to foreign companies. We WILL remember this!\n#COVID19Pandemic \n#covid19\n#N95masks",
"Awareness for using mask by @WHO #recommendations @CMOTamilNadu #COVID19 #Corona @MoHFW_INDIA #TNHealth #CVB…",
"@Rakshitwa @beingdumber @taapsee Nitish Kumar asked for 10 lakh N95 masks but got 50,000. Sought five lakh PPE kits…",
"@CNN You mean the masks everyone was saying #Covit19 #COVID<U+30FC>19 #coronavirus can pass right through as per what was…",
"2 BILLION masks = global production capacity in 2.5 MONTHS = quantity of what China imported in 5 WEEKS since Jan…",
"@CDCgov @CDCDirector @SF_DPH Please remember those with #COPD #LungDisease #HeartDisease when requiring #masks for…",
"If you have to go out and can’t avoid being around people, wear a mask. Masks are a complement to social distancin…",
"@CTVVancouver According to Dr \"doom\" Bonnie Henry, masks aren't of any use to the general public, in fact, she clai…",
"@maddow Next time you talk about the government stating everyone needs to wear a mask ask a government official whe…",
"Wear a mask in you are unwell or taking care of a person with suspected 2019-nCoV infection.\nInfo source: WHO…",
"7/9 For those who need a #COVID19 mask ASAP and have no talent, time or materials to make a mask. We give you the e…",
"jasminesade_art\nIs taking orders for masks (w/ filter pocket) \nMsg jasminesade_art if interested <U+0001F496> \n.\n.\n.\n.\n.\n.",
"What China do to cut down the spread dramatically are only to make people stay at home and wear masks!!!!!@PHE_uk…",
"@CNN hey i thought we were boycotting China\nthen why the Americans need Chinese masks?\ngo fuck yourself \n#BoycottChina #coronavirus",
"@CNN @CillizzaCNN [covid19],important:\n1.from_air->mask->mask_reuse.\n2.from_touch->clean_hands.\n\nps1.20200328.…",
"@kr3at #WearMask Everyone !!!\n\n\nSimply wearing a mask Nationwide ... SAVES #CZECHOSLOVAKIA…"
), retweetCount = c(1372, 9, NA, 8, 30, NA, NA, NA, NA, NA, 34,
NA, NA, NA, NA, NA, 192, NA, NA, NA, 50, NA, 221, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 17, 1948, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 53, NA, 1948, NA), favorite_count = c(3488,
23, NA, 7, 46, NA, NA, NA, NA, NA, 62, NA, NA, NA, NA, NA, 710,
NA, NA, NA, 48, NA, 506, NA, NA, NA, NA, NA, NA, NA, NA, NA,
29, 4963, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 164,
NA, 4963, NA), url = c("twitter.com/33617860/status/1245925124483809280",
"twitter.com/1106803026/status/1245925141046935552", "twitter.com/421517829/status/1245925143479595008",
"twitter.com/1245594213795778560/status/1245925159724171264",
"twitter.com/2178012643/status/1245925173858975744", "twitter.com/1220529001241989120/status/1245925183010963456",
"twitter.com/1115874631/status/1245925217790124032", "twitter.com/1243781317747077120/status/1245925225327235072",
"twitter.com/2729830110/status/1245925273230438400", "twitter.com/1240114893178667008/status/1245925291374964736",
"twitter.com/88875512/status/1245925292972969984", "twitter.com/1245907384993812480/status/1245925320282136576",
"twitter.com/3431854829/status/1245925357116481536", "twitter.com/1245907384993812480/status/1245925380973871104",
"twitter.com/1243781317747077120/status/1245925393095217152",
"twitter.com/1230706447257751552/status/1245925541644992512",
"twitter.com/4437322348/status/1245925779117985792", "twitter.com/1245907384993812480/status/1245925802442555392",
"twitter.com/829633267942903808/status/1245925807211663360",
"twitter.com/403961389/status/1245925829755969536", "twitter.com/17183161/status/1245925869010292736",
"twitter.com/1408320152/status/1245925960550993920", "twitter.com/1245663286881902592/status/1245926011679600640",
"twitter.com/244306637/status/1245926036321103872", "twitter.com/24327965/status/1245926059318448128",
"twitter.com/1164222471639318528/status/1245926089068646400",
"twitter.com/16328861/status/1245926148967727104", "twitter.com/6125082/status/1.24592618943e+18",
"twitter.com/3685052935/status/1245926191850065920", "twitter.com/868528766355558400/status/1245926251455365120",
"twitter.com/1223273206636851200/status/1245926283093012480",
"twitter.com/16328861/status/1245926292274311168", "twitter.com/1160039103905390592/status/1245926310670565376",
"twitter.com/1236738668905127936/status/1245926356468162560",
"twitter.com/400431217/status/1245926363833532416", "twitter.com/1244269086088945664/status/1245926365116809216",
"twitter.com/850227053139853312/status/1245926366781902848",
"twitter.com/244314850/status/1245926393822605312", "twitter.com/1244446404178665472/status/1245926398578978816",
"twitter.com/3184694718/status/1245926421601509376", "twitter.com/82208845/status/1245926438143807488",
"twitter.com/1216588869530836992/status/1245926569303891968",
"twitter.com/4770303330/status/1245926579936432128", "twitter.com/1245580876047499264/status/1245926591806361600",
"twitter.com/904740870817120256/status/1245926610181574656",
"twitter.com/934146138/status/1245926629022433280", "twitter.com/1223547711468777472/status/1245926703257366528",
"twitter.com/840838036707393536/status/1245926832618131456",
"twitter.com/1236738668905127936/status/1245926888087773184",
"twitter.com/1230706447257751552/status/1245926935042994176"),
friendCount = c(1018, 326, 1205, 48, 3690, 1584, 55, 42,
580, 11, 3610, 13, 110, 13, 42, 382, 43, 13, 106, 4195, 599,
8, 89, 414, 280, 931, 5001, 1602, 1327, 227, 310, 5001, 26,
65, 2371, 31, 523, 228, 8, 671, 499, 1324, 333, 5, 852, 5457,
7, 48, 65, 382), screenNames = c("DayssiOK", "DrAmbrishMithal",
"LuvAminaKausar", "Sunnie09370280", "balajis", "World_In_Mins",
"CGTNOfficial", "a7BdaSSeyL4czNw", "ShellBell915", "remedair",
"RitasArtCafe", "trumpfacemasks", "SCC_OES", "trumpfacemasks",
"a7BdaSSeyL4czNw", "REX38225222", "e2p71828", "trumpfacemasks",
"lamsonlinshen", "SteveJumaaa", "patfloTO", "tenforadollar",
"sashir_milne", "rdesai711", "agrothey", "foreskinjim1",
"rover223", "scanman", "AlDubest2Evry1", "HurtadoMarleen",
"johnmik63542947", "rover223", "CowlSolomon", "spacetinyearth",
"jmegown52302", "DrPonnarasu", "pankajupa120", "JoaoNewman",
"LalalaHK1", "SaturniaC", "NYCMediaMix", "ToscasReturn",
"JamesDallas9175", "cornzal", "CEDRdigital", "NadraRae",
"SiluMa4", "1Wa49R41L3pVzQj", "spacetinyearth", "REX38225222"
), userID = c(33617860, 1106803026, 421517829, 1.24559e+18,
2178012643, 1.22e+18, 1115874631, 1.24e+18, 2729830110, 1.24e+18,
88875512, 1.24591e+18, 3431854829, 1.24591e+18, 1.24e+18,
1.23071e+18, 4437322348, 1.24591e+18, 8.29633e+17, 403961389,
17183161, 1408320152, 1.24566e+18, 244306637, 24327965, 1.16422e+18,
16328861, 6125082, 3685052935, 8.68529e+17, 1.22327e+18,
16328861, 1.16004e+18, 1.24e+18, 400431217, 1.24427e+18,
8.50227e+17, 244314850, 1.24445e+18, 3184694718, 82208845,
1.22e+18, 4770303330, 1.24558e+18, 9.04741e+17, 934146138,
1.22355e+18, 8.40838e+17, 1.24e+18, 1.23071e+18), language = c("en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en", "en",
"en", "en", "en", "en", "en", "en", "en", "en", "en"), replyToScreenName = c("None",
"ArvinderSoin", "None", "None", "None", "World_In_Mins",
"None", "None", "None", "None", "None", "Read5000YrLeap",
"None", "jmcmaccarr", "None", "CNN", "None", "Constitution999",
"None", "None", "TwitterSafety", "None", "None", "None",
"None", "kittywuv1", "BeauTFC", "None", "None", "None", "None",
"theblondeMD", "None", "TIME", "3M", "None", "Rakshitwa",
"CNN", "None", "CDCgov", "None", "CTVVancouver", "maddow",
"None", "CEDRdigital", "None", "None", "CNN", "CNN", "kr3at"
), replyToID = c("None", "1.13442E+18", "None", "None", "None",
"1.22053E+18", "None", "None", "None", "None", "None", "154243839",
"None", "48150879", "None", "759251", "None", "1.04747E+18",
"None", "None", "95731075", "None", "None", "None", "None",
"1.21653E+18", "1.05676E+18", "None", "None", "None", "None",
"230792524", "None", "14293310", "378197959", "None", "9.81585E+17",
"759251", "None", "146569971", "None", "16313405", "16129920",
"None", "9.04741E+17", "None", "None", "759251", "759251",
"139283160"), retweetUserScreenName = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), retweetUserID = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), followersCount = c(1452,
3844, 2398, 1, 179896, 1283, 14036740, 24, 329, 3, 7133,
2, 1050, 2, 24, 121, 4, 2, 38, 2533, 235, 2, 5, 148, 2312,
265, 1572, 8067, 1265, 167, 13, 1574, 1, 2, 972, 1, 107,
7, 0, 73, 295, 1160, 849, 1, 7519, 1749, 0, 4, 2, 121), userMentions = c(NA,
"ArvinderSoin", NA, NA, NA, "3M", NA, NA, NA, NA, NA, "Read5000YrLeap",
NA, "jmcmaccarr", NA, "CNN", NA, "Constitution999", NA, NA,
"TwitterSafety", NA, NA, NA, NA, "kittywuv1", "BeauTFC",
NA, NA, NA, NA, "theblondeMD", NA, "TIME", "3M", "WHO", "Rakshitwa",
"CNN", NA, "CDCgov", NA, "CTVVancouver", "maddow", NA, NA,
NA, NA, "CNN", "CNN", "kr3at"), userMentionsID = c(NA, 1.13442e+18,
NA, NA, NA, 378197959, NA, NA, NA, NA, NA, 154243839, NA,
48150879, NA, 759251, NA, 1.05e+18, NA, NA, 95731075, NA,
NA, NA, NA, 1.21653e+18, 1.05676e+18, NA, NA, NA, NA, 230792524,
NA, 14293310, 378197959, 14499829, 9.81585e+17, 759251, NA,
146569971, NA, 16313405, 16129920, NA, NA, NA, NA, 759251,
759251, 139283160), hashtag1 = c("coronavirus", NA, "corona",
"mask", NA, "Florida", "Coronavirus", NA, "coronavirus",
"facemask", "coronavirus", "Boycott3M", NA, "Boycott3M",
NA, "WearMask", NA, NA, "Covid19", "COVID19", NA, NA, NA,
"covid19", NA, "covid19", NA, NA, NA, "homemade", "covid19",
NA, "China", NA, "COVID19Pandemic", "recommendations", NA,
"Covit19", NA, "COPD", NA, NA, NA, NA, "COVID19", NA, NA,
"BoycottChina", NA, "WearMask"), hashtag2 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), mediatype = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), mediaURL = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -50L), spec = structure(list(
cols = list(createdAt = structure(list(), class = c("collector_character",
"collector")), timestamp = structure(list(), class = c("collector_double",
"collector")), id_str = structure(list(), class = c("collector_double",
"collector")), text = structure(list(), class = c("collector_character",
"collector")), retweetCount = structure(list(), class = c("collector_double",
"collector")), favorite_count = structure(list(), class = c("collector_double",
"collector")), url = structure(list(), class = c("collector_character",
"collector")), friendCount = structure(list(), class = c("collector_double",
"collector")), screenNames = structure(list(), class = c("collector_character",
"collector")), userID = structure(list(), class = c("collector_double",
"collector")), language = structure(list(), class = c("collector_character",
"collector")), replyToScreenName = structure(list(), class = c("collector_character",
"collector")), replyToID = structure(list(), class = c("collector_character",
"collector")), retweetUserScreenName = structure(list(), class = c("collector_logical",
"collector")), retweetUserID = structure(list(), class = c("collector_logical",
"collector")), followersCount = structure(list(), class = c("collector_double",
"collector")), userMentions = structure(list(), class = c("collector_character",
"collector")), userMentionsID = structure(list(), class = c("collector_double",
"collector")), hashtag1 = structure(list(), class = c("collector_character",
"collector")), hashtag2 = structure(list(), class = c("collector_logical",
"collector")), mediatype = structure(list(), class = c("collector_logical",
"collector")), mediaURL = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
> groups <- (split(sample_df, (seq(nrow(sample_df))-1) %/% 20)) #here I want 20 rows per file until last row is reached
> for (i in seq_along(groups)) {
+ write.csv(groups[[i]], paste0("sample_output_file", i, ".csv")) #iterate and write file
+ }
We can create a variable from createdAt
and then do the group_split
to a list
of data.frame. Here, we can extract specific substring either with str_replace
by removing the first word followed by space, while capturing the next word, space, some digits and use that in the replacement.
library(dplyr)
library(stringr)
sample_df %>%
mutate(month_day = str_replace(createdAt,
"^\\w+\\s+(\\w+\\s+\\d+).*", "\\1")) %>%
group_split(month_day)
NOTE: there is no need for mutate
as month_day
can be created on the fly in group_split
itself