Search code examples
rrandomlocationlatitude-longitudedata-hiding

Data Hiding in R


I have a vector set 'location' with 1000 locations containing lat/long values. I am looking to hide certain % of locations randomly and estimate the lat/long values of those locations through my algorithm. Say I want to hide 10% of those 1000 locations randomly and make them unknown, how do I hide values from my dataset in R. Is there any package available in R which will help me achieve this. So if this is a complete dataset location:

print(location)
Longitude               Latitude
74.858863999999997 31.327629000000002
74.224755999999999 31.309773000000000
74.216177999999999 31.463429000000001
74.321051999999995 31.575917000000000
74.349832000000006 31.582062000000001
74.319663000000006 31.573923000000001
74.349384000000001 31.527654999999999
74.410433999999995 31.521415999999999
74.349609000000001 31.527670000000001
74.426238999999995 31.522907000000000
74.309755999999993 31.561537999999999
74.426238999999995 31.522907000000000
74.282814000000002 31.456077000000001
74.224754000000004 31.309773000000000
74.426238999999995 31.522907000000000
74.365804999999995 31.470144000000001
74.311349000000007 31.483550999999999
74.312512999999996 31.472501999999999
74.426238999999995 31.522907000000000
74.319362999999996 31.484127000000001
74.370300000000000 31.537609000000000
74.879557000000005 32.104958000000003
74.426238999999995 31.522907000000000
73.463269999999994 30.815715999999998
74.412903999999997 31.470146000000000
74.319362999999996 31.484127999999998
74.412891999999999 31.470144999999999
74.313017000000002 31.484044999999998
74.412890000000004 31.470147999999998
74.328925999999996 31.536244000000000
74.336599000000007 31.528677999999999

I would like only the following printed:

print(location)
Longitude               Latitude
74.858863999999997 31.327629000000002
74.224755999999999 31.309773000000000
74.216177999999999 31.463429000000001
74.321051999999995 31.575917000000000
74.349832000000006 31.582062000000001
74.319663000000006 31.573923000000001
74.349384000000001 31.527654999999999
74.410433999999995 31.521415999999999
74.349609000000001 31.527670000000001
74.426238999999995 31.522907000000000
74.309755999999993 31.561537999999999
74.426238999999995 31.522907000000000
74.282814000000002 31.456077000000001
74.224754000000004 31.309773000000000
74.426238999999995 31.522907000000000
74.365804999999995 31.470144000000001
74.311349000000007 31.483550999999999
74.312512999999996 31.472501999999999
74.426238999999995 31.522907000000000
74.319362999999996 31.484127000000001
74.370300000000000 31.537609000000000
74.879557000000005 32.104958000000003
74.426238999999995 31.522907000000000
73.463269999999994 30.815715999999998
74.412903999999997 31.470146000000000
74.319362999999996 31.484127999999998
74.412891999999999 31.470144999999999
74.313017000000002 31.484044999999998

But the dataset still contains the values that were not printed and were "hidden".


Solution

  • I would just define a vector (which could be a column of the data set or could be separate) that indicates whether each row is hidden or shown. For example:

    # to hide about 20% of your data:
    hide_row = which(rbinom(n = nrow(location), size = 1, prob = 0.2) == 1)
    # to hide exactly 20% of your data:
    hide_row = sample(1:nrow(location), size = 0.2 * nrow(location))
    
    # print all but the hidden rows
    location[-hide_row, ]
    

    You seem to not want this (not sure of your use case), but a more natural way to do this would be to make a copy of the data with the hidden rows omitted:

    partial_location = location[-hide_row, ]