How to sample data from the proximity of existing data?

I have data for xor as below -

x	y	z	x ^ y ^ z
0	0	1	1
0	1	0	1
1	0	0	1
1	1	1	1

Kept only the ones that make the xor of all three equal to 1.

I want to generate synthetic data around the already available data within some range uniformly at random. The above table can be thought of as seed data. An example of expected table will be as follows:

x	y	z	x ^ y ^ z
0.1	0.3	0.8	0.9
0.25	0.87	0.03	0.99
0.79	0.09	0.28	0.82
0.97	0.76	0.91	0.89

Above table is sampled with a range of 0 to 0.3 for 0 value and with range 0.7 to 1 for value 1.

I want to achieve this using pytorch.

Solution

For a problem such as this, you are able to completely synthesise data without using a reference because it has a simple solution. For zero (0-0.3) you can use the torch.rand function to generate uniformly random data for 0-1 and scale it. For one (0.7-1) you can do the same and just offset it:

N = 5
p = 0.5 #change this to bias your outputs
x_is_1 = torch.rand(N)>p #decide if x is going to be 1 or 0
y_is_1 = torch.rand(N)>p #decide if y is going to be 1 or 0 
not_all_0 = ~(x_is_1 & y_is_1) #get rid of the x ^ y ^ z = 0 elements
x_is_1,y_is_1 = x_is_1[not_all_0],y_is_1[not_all_0]
N = x_is_1.shape[0]
x = torch.rand(N) * 0.3
x = torch.where(x_is_1,x+0.7,x)
y = torch.rand(N) * 0.3
y = torch.where(y_is_1,y+0.7,y)
z = torch.logical_xor(x_is_1,y_is_1).float()
triple_xor = 1 - torch.rand(z.shape)*0.3
print(torch.stack([x,y,z,triple_xor]).T)  
       #x        y       z       x^y^z                                                                                                                                                                   
tensor([[0.2615, 0.7676, 1.0000, 0.8832],
    [0.9895, 0.0370, 1.0000, 0.9796],
    [0.1406, 0.9203, 1.0000, 0.9646],
    [0.1799, 0.9722, 1.0000, 0.9327]])

Or, to treat your data as the basis (for more complex data), there is a preprocessing tool known as gaussian noise injection which seems to be what you're after. Or you can just define a function and call it a bunch of times.

def add_noise(x,y,z,triple_xor,range=0.3):
     def proc(dat,range):
        return torch.where(dat>0.5,torch.rand(dat.shape)*range+1-range,torch.rand(dat.shape)*range)
     return proc(x,range),proc(y,range),proc(z,range),proc(triple_xor,range)

gen_new_data = torch.cat([torch.stack(add_noise(x,y,z,triple_xor)).T for _ in range(5)])