I need to add a fingerprint to each row in a dataset so to check with a later version of the same set to look for difference.
I know how to add hash for each row in R like below:
data.frame(iris,hash=apply(iris,1,digest))
I am learning to use dplyr
as the dataset is getting huge and I need to store them in SQL Server, I tried something like below but the hash is not working, all rows give the same hash:
iris %>%
rowwise() %>%
mutate(hash=digest(.))
Any clue for row-wise hashing using dplyr? Thanks!
We could use do
res <- iris %>%
rowwise() %>%
do(data.frame(., hash = digest(.)))
head(res, 3)
# A tibble: 3 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species hash
# <dbl> <dbl> <dbl> <dbl> <fctr> <chr>
#1 5.1 3.5 1.4 0.2 setosa e261621c90a9887a85d70aa460127c78
#2 4.9 3.0 1.4 0.2 setosa 7bf67322858048d82e19adb6399ef7a4
#3 4.7 3.2 1.3 0.2 setosa c20f3ee03573aed5929940a29e07a8bb
Note that in the apply
procedure, all the columns are converted to a single class as apply
converts to matrix
and matrix can hold only a single class. There will be a warning about converting the factor
to character
class