Search code examples
rdataframeschemagroupingrows

i have a dataframe with more than 2000 rows. From the df, i need to create a schema in r


The format is as follows: enter image description here

I need to create the following schema for the df. I may have 2 or more instruction for an item.

{ Section: Section1, Title: Power Cables List_of_Instruction: [ { Instruction: information provided }, ] List_of_items:[ { Product: Item1, Unit:10, Rate: 100 }, { Product: Item2, Unit:2, Rate: 2000 } ] List_of_instruction:[ { Instruction:information 1.0 }, ] List_of_items:[ { Product:item.1.0, Unit:5, Rate:10 } ], Title: Cable Containment, List_of_instruction:[ { Instruction: information } ], List_of_items:[ { Product:item, Unit:100, Rate:4 } ]

} ,

{ Section: …. Title: …. }


Solution

  • Here is one take, I assume you want to group under Section, Title and Instruction, those are listed on the top level in each segment, and it makes senes from your data.

    
    
    d <- read.csv(text=
    "Description,Unit,Rate,isItem,IsSection,IsInstruction,IsTitle,sheetname
    Section1,NA,NA,FALSE,TRUE,FALSE,FALSE,Sheet1
    Power Cables,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
    Information Provided,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
    Item1,10,100,TRUE,FALSE,FALSE,FALSE,Sheet1
    Item2,2,2000,TRUE,FALSE,FALSE,FALSE,Sheet1
    Information 1.0,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
    Item1.0,5,10,TRUE,FALSE,FALSE,FALSE,Sheet1
    Cable containment,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
    Information,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
    item,100,4,TRUE,FALSE,FALSE,FALSE,Sheet1
    Section2,NA,NA,NA,TRUE,FALSE,FALSE,Sheet1
    Pipe,NA,NA,FALSE,FALSE,FALSE,TRUE,Sheet1
    details,NA,NA,FALSE,FALSE,TRUE,FALSE,Sheet1
    Items4,7,500,TRUE,FALSE,FALSE,FALSE,Sheet1
    ",header=TRUE)
    
    library(jsonlite)
    library(zoo)
    
    d2 <- d %>% mutate( Section = na.locf( ifelse( IsSection, Description, NA ) ) ) %>%
        group_by( Section ) %>%
        mutate( Title = na.locf( ifelse( IsTitle, Description, NA ), na.rm=FALSE ) ) %>%
        group_by( Section, Title ) %>%
        mutate( Instruction = na.locf( ifelse( IsInstruction, Description, NA ), na.rm=FALSE ) )
    
    ## A helper function to do the heavy finesse work:
    make.segment <- function(d) {
        with(
            d,
            list(
                Section = Section[1],
                Title = Title[1],
                Instruction = Instruction[1],
                List_of_items =
                    d %>% filter( isItem ) %>%
                    rename( Product=Description ) %>%
                    select( Product, Unit, Rate ) %>%
                    apply( 1, as.list )
            )
        )
    }
    
    ## Standard dplyr work to do the heavy lifting:
    l <- d2 %>% filter( !is.na(Title) & !is.na(Instruction) ) %>%
        group_by( Section, Title, Instruction ) %>%
        do( segment = make.segment(.) ) %>%
        pull( segment ) %>%
        as.list
    
    toJSON( l, pretty=TRUE, auto_unbox=TRUE )
    
    
    

    Your example does show json, so I assume that is what you want out, this produces:

    
    [
      {
        "Section": "Section1",
        "Title": "Cable containment",
        "Instruction": "Information",
        "List_of_items": [
          {
            "Product": "item",
            "Unit": "100",
            "Rate": "4"
          }
        ]
      },
      {
        "Section": "Section1",
        "Title": "Power Cables",
        "Instruction": "Information 1.0",
        "List_of_items": [
          {
            "Product": "Item1.0",
            "Unit": "5",
            "Rate": "10"
          }
        ]
      },
      {
        "Section": "Section1",
        "Title": "Power Cables",
        "Instruction": "Information Provided",
        "List_of_items": [
          {
            "Product": "Item1",
            "Unit": "10",
            "Rate": " 100"
          },
          {
            "Product": "Item2",
            "Unit": " 2",
            "Rate": "2000"
          }
        ]
      },
      {
        "Section": "Section2",
        "Title": "Pipe",
        "Instruction": "details",
        "List_of_items": [
          {
            "Product": "Items4",
            "Unit": "7",
            "Rate": "500"
          }
        ]
      }
    ]