Search code examples
pythongenerator

Generator map methods implementation


So I have a generator of a sine wave that returns two values like yield time, sine. I would like to be able to use point functions to add stuff to this generator like the following:

my_generator.add_noise(mean=0, std=1).shuffle().to_pandas()

Where add_noise will for example add a random uniform noise to the sine value only while leaving the time untouched. The output of my_generator.add_noise(mean=0, std=1) would be another generator but with a noisy sine.

My idea is to use it incrementally in a similar way to TensorFlow Dataset. However I don't find how to do it probably due to ignorance of words to google.

Also, is this good practice? or is it a better method? I am doing a dataset generator to try some algorithms and I want it to be escalable. So if I change the generator to a logarithmic generator I dont need to change the noise function for example.

I had a partial solution like this:

import math
import random
import pandas as pd

class SineWaveGenerator:
    def __init__(self, freq, amplitude, sampling_rate, num_samples):
        self.freq = freq
        self.amplitude = amplitude
        self.sampling_rate = sampling_rate
        self.num_samples = num_samples
    
    def __iter__(self):
        for i in range(self.num_samples):
            time = i / self.sampling_rate
            sine = self.amplitude * math.sin(2 * math.pi * self.freq * time)
            yield time, sine
    
    def add_noise(self, noise_amplitude):
        for time, sine in self:
            noisy_sine = sine + noise_amplitude * random.uniform(-1, 1)
            yield time, noisy_sine
    
    def to_pandas(self):
        return pd.DataFrame(list(self), columns=["Time", "Sine"])

This works with:

sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
df = sin_generator.as_pandas()

or

sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
noisy_sine_wave = sine_wave.add_noise(noise_amplitude=0.1)

but the following breaks:

sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
df_noisy_sine_wave = sine_wave.add_noise(noise_amplitude=0.1).as_pandas()

Saying: AttributeError: 'generator' object has no attribute 'as_pandas'


Solution

  • So based on the comments (and I also looked at the source code of tf.data.Dataset. I did the following solution:

    class SineWaveGenerator:
    
        # ... all the methods
    
        def add_noise(self, mean: float = 0., std: float = 1.):
            class SineNoisyNormalGenerator(SineWaveGenerator):
                def __init__(self, generator):
                    self.generator = generator
    
                def __iter__(self):
                    for date, sine in self.generator:
                        noisy_sine = sine + np.random.normal(mean, std)
                        yield date, noisy_sine
            return SineNoisyNormalGenerator(self)
    

    I can now do:

    sin_generator = SineWaveGenerator(freq=10, amplitude=1, sampling_rate=1000, num_samples=1000)
    noisy_sine_generator = sine_wave.add_noise(noise_amplitude=0.1)