So I have a generator of a sine wave that returns two values like yield time, sine
.
I would like to be able to use point functions to add stuff to this generator like the following:
my_generator.add_noise(mean=0, std=1).shuffle().to_pandas()
Where add_noise
will for example add a random uniform noise to the sine
value only while leaving the time
untouched. The output of my_generator.add_noise(mean=0, std=1)
would be another generator but with a noisy sine
.
My idea is to use it incrementally in a similar way to TensorFlow Dataset. However I don't find how to do it probably due to ignorance of words to google.
Also, is this good practice? or is it a better method? I am doing a dataset generator to try some algorithms and I want it to be escalable. So if I change the generator to a logarithmic generator I dont need to change the noise function for example.
I had a partial solution like this:
import math
import random
import pandas as pd
class SineWaveGenerator:
def __init__(self, freq, amplitude, sampling_rate, num_samples):
self.freq = freq
self.amplitude = amplitude
self.sampling_rate = sampling_rate
self.num_samples = num_samples
def __iter__(self):
for i in range(self.num_samples):
time = i / self.sampling_rate
sine = self.amplitude * math.sin(2 * math.pi * self.freq * time)
yield time, sine
def add_noise(self, noise_amplitude):
for time, sine in self:
noisy_sine = sine + noise_amplitude * random.uniform(-1, 1)
yield time, noisy_sine
def to_pandas(self):
return pd.DataFrame(list(self), columns=["Time", "Sine"])
This works with:
sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
df = sin_generator.as_pandas()
or
sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
noisy_sine_wave = sine_wave.add_noise(noise_amplitude=0.1)
but the following breaks:
sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
df_noisy_sine_wave = sine_wave.add_noise(noise_amplitude=0.1).as_pandas()
Saying: AttributeError: 'generator' object has no attribute 'as_pandas'
So based on the comments (and I also looked at the source code of tf.data.Dataset. I did the following solution:
class SineWaveGenerator:
# ... all the methods
def add_noise(self, mean: float = 0., std: float = 1.):
class SineNoisyNormalGenerator(SineWaveGenerator):
def __init__(self, generator):
self.generator = generator
def __iter__(self):
for date, sine in self.generator:
noisy_sine = sine + np.random.normal(mean, std)
yield date, noisy_sine
return SineNoisyNormalGenerator(self)
I can now do:
sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
noisy_sine_generator = sine_wave.add_noise(noise_amplitude=0.1)