Search code examples
python-3.xmachine-learningclassificationnoise

Adding gaussian noise to a dataset of floating points and save it (python)


I'm working on classification problem where i need to add different levels of gaussian noise to my dataset and do classification experiments until my ML algorithms can't classify the dataset. unfortunately i have no idea how to do that. any advise or coding tips on how to add the gaussian noise?


Solution

  • You can follow these steps:

    1. Load the data into a pandas dataframe clean_signal = pd.read_csv("data_file_name")
    2. Use numpy to generate Gaussian noise with the same dimension as the dataset.
    3. Add gaussian noise to the clean signal with signal = clean_signal + noise

    Here's a reproducible example:

    import pandas as pd
    # create a sample dataset with dimension (2,2)
    # in your case you need to replace this with 
    # clean_signal = pd.read_csv("your_data.csv")   
    clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float) 
    print(clean_signal)
    """
    print output: 
        A    B
    0  1.0  2.0
    1  3.0  4.0
    """
    import numpy as np 
    mu, sigma = 0, 0.1 
    # creating a noise with the same dimension as the dataset (2,2) 
    noise = np.random.normal(mu, sigma, [2,2]) 
    print(noise)
    
    """
    print output: 
    array([[-0.11114313,  0.25927152],
           [ 0.06701506, -0.09364186]])
    """
    signal = clean_signal + noise
    print(signal)
    """
    print output: 
              A         B
    0  0.888857  2.259272
    1  3.067015  3.906358
    """ 
    

    Overall code without the comments and print statements:

    import pandas as pd
    # clean_signal = pd.read_csv("your_data.csv")
    clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float) 
    import numpy as np 
    mu, sigma = 0, 0.1 
    noise = np.random.normal(mu, sigma, [2,2])
    signal = clean_signal + noise
    

    To save the file back to csv

    signal.to_csv("output_filename.csv", index=False)