Search code examples
rdataframematrixkerasnormalization

Normalising matrix which contains lists


Currently, I have the following dataframe (the first 30 columns are from dput()):

structure(list(PacketTime = c(0.0636830000000002, 0.0691829999999989, 
0.0639040000000008, 0.0636270000000003, 0.0656370000000024, 0.064778000000004, 
0.0616950000000003, 0.0666280000000015, 0.0630829999999989, 0.0665130000000005, 
0.0621160000000032, 0.0654010000000014, 0.0652889999999928, 0.0640989999999988, 
0.0621339999999861, 0.0645319999999998, 0.065757000000005, 0.0624459999999942, 
0.061782000000008, 0.0626439999999917, 0.0648419999999987, 0.0664910000000134, 
0.0644649999999984, 0.0654030000000034, 0.0657139999999998, 0.0642799999999966, 
0.069137000000012, 0.0631520000000023, 0.0634139999999945, 0.0615009999999927
), FrameLen = list(c(304L, 276L, 276L), c(304L, 276L, 276L), 
    c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 276L
    ), c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 
    276L, 276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 276L, 
    276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 
    276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 
    276L, 276L), c(304L, 276L, 276L, 276L, 276L), c(304L, 276L, 
    276L), c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 
    276L, 276L, 276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 
    276L), c(304L, 276L, 276L, 276L), c(304L, 276L, 276L, 276L, 
    276L), c(304L, 276L, 276L), c(304L, 276L, 276L), c(304L, 
    276L, 276L), c(304L, 276L, 276L), c(304L, 276L, 276L)), IPLen = list(
    c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 272L
    ), c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 
    272L), c(300L, 272L, 272L), c(300L, 272L, 272L, 272L, 272L
    ), c(300L, 272L, 272L), c(300L, 272L, 272L, 272L, 272L), 
    c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 272L
    ), c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 
    272L), c(300L, 272L, 272L, 272L, 272L), c(300L, 272L, 272L
    ), c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 
    272L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 272L
    ), c(300L, 272L, 272L, 272L), c(300L, 272L, 272L, 272L, 272L
    ), c(300L, 272L, 272L), c(300L, 272L, 272L), c(300L, 272L, 
    272L), c(300L, 272L, 272L), c(300L, 272L, 272L)), Movement = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

From here, I can use the keras package to put the dataframe (in variable packets) into a matrix using:

packets.m <- as.matrix(packets)

However, when I attempt to pass this into the model (without normalisation) or normalise before passing, I receive the following error:

Error in py_call_impl(callable, dots$args, dots$keywords) : Matrix type cannot be converted to python (only integer, numeric, complex, logical, and character matrixes can be converted

Thus, how can I effectively normalise the two columns FrameLen and IPLen containing lists, so that I can accurately use this for the deep learning model using the keras package?

EDIT: The full dput() can be found here, for the packets dataframe https://pastebin.com/cXKdSB2y


Solution

  • It depends on how you trained this data

    library(tidyverse)
    

    Multiple instances

    df %>% 
      unnest()
    

    Multiple features

    df %>% 
      mutate(position = map(FrameLen,seq_along),id = row_number()) %>%
      unnest() %>% 
      pivot_wider(names_from = position,values_from = c(FrameLen,IPLen))