Search code examples
pythonrpandasdata-analysisdata-mining

Converting Python dataframe manipulation code to R


I have a dataframe manipulation in Python that I am trying to translate into R. However, I'm running into issues with my R translation.

col_names = China_2.columns
provinces = col_names[0]
days = col_names[1:]
day_1 = days[0]
other_days = days[1:]
China_dc = pd.DataFrame()
China_dc['Province'] = China_2[provinces]
China_dc[day_1] = China_2[day_1]
China_dc[other_days] = daily_cases
China_dc.head(3)

I attempted to rewrite it in R as:

col_names <- names(china_2)
provinces <- col_names[1]
days <- col_names[-1]
day_1 <- days[2]
other_days <- days[-1] 
China_dc <- data.frame(Province = rep(NA_character_, nrow(China_2)))
China_dc$Province <- China_2[[provinces]]
China_dc[[day_1]] <- China_2[[day_1]]
China_dc[, other_days] <- daily_cases_df
head(China_dc, 3)

I received an error with the R code. Specifically, the problem is with the days <- col_names[-1] line. The output I want to achieve in R should look like this (using the Python code's output for reference):

enter image description here

Any help translating the Python code to R correctly would be appreciated!


Solution

  • You're running into a problem with the R code because of the line days <- col_names[-1]. In Python, you used col_names[1:] to exclude the first element, but in R, the indexing starts at 1. So, to exclude the first element in R, you should use col_names[-1]. However, you also need to adjust the index for day_1 and other_days since R's indexing starts at 1.

    Updated R code:

    col_names <- names(china_2)
    provinces <- col_names[1]
    days <- col_names[-1]
    day_1 <- days[1]  # To align with R's 1-based indexing
    other_days <- days[-1]  # To align with R's 1-based indexing
    China_dc <- data.frame(Province = rep(NA_character_, nrow(china_2)))
    China_dc$Province <- china_2[[provinces]]
    China_dc[[day_1]] <- china_2[[day_1]]
    China_dc[, other_days] <- daily_cases_df
    head(China_dc, 3)