Search code examples
pythondatetimedata-masking

How do I add uniform random noise to a timestamp?


I am trying to mask datetime values by adding uniform random noise to the datetime values.

I currently use the Cape Python module to add noise to my data. However I'd like to develop my own custom function similar to the one provided by Cape Python below.

!pip install cape-privacy
from cape_privacy.pandas.transformations import *
import datetime

s = pd.Series([datetime.date(year=2020, month=2, day=15)])
perturb = DatePerturbation(frequency="MONTH", min=-10, max=10)
perturb(s)
# Returns 2019-07-20

Is there a way I can add noise (between min and max) to the DAY, MONTH or YEAR or a combination of the aforementioned, given a datetime value and make it look credible?

# Input
2021-09-23

# Expected Output when noise is added to DAY between -10 and 10
2021-09-20

Solution

  • I don't know Cape Python, so this might be off...

    Here's a way to do that:

    from datetime import date, timedelta
    from random import randint
    
    def date_pertubation(d, attribs, minimum, maximum):
        if isinstance(attribs, str):
            attribs = [attribs]
        attribs = [attrib.casefold() for attrib in attribs]
        year = d.year
        if "year" in attribs:
            year += randint(minimum, maximum)
        month = d.month - 1
        if "month" in attribs:
            month += randint(minimum, maximum)
            year_delta, month = divmod(month, 12)
            year += year_delta
        month += 1
        day_delta = d.day - 1
        if "day" in attribs:
            day_delta += randint(minimum, maximum)
        
        return date(year, month, 1) + timedelta(days=day_delta)
    

    This

    d = date(year=2020, month=2, day=15)
    for _ in range(5):
        print(date_pertubation(d, "DAY", -20, 20).strftime("%Y-%m-%d"))
    for _ in range(5):
        print(date_pertubation(d, "YEAR", -3, 3).strftime("%Y-%m-%d"))
    for _ in range(5):
        print(date_pertubation(d, ["YEAR", "MONTH", "DAY"], -3, 3).strftime("%Y-%m-%d"))
    

    will produce something like

    2020-02-11
    2020-02-09
    2020-01-29
    2020-02-29
    2020-03-01
    
    2022-02-15
    2022-02-15
    2020-02-15
    2017-02-15
    2017-02-15
    
    2016-12-12
    2016-12-14
    2021-01-13
    2019-11-15
    2021-05-14