Search code examples
pythonnumpytrain-test-split

Even-Odd Train-Test Split with 2D array input and return two tuples of the form (X_train, y_train), (X_test, y_test)


trying to complete this function. Any help would be appreciated. This is my work so far:

Should take a 2-d numpy array as input:

array([[ 1.961e+03,  2.263e-02],
       [ 1.962e+03,  1.420e-02],
       [ 1.963e+03,  8.360e-03],
       [ 1.964e+03,  5.940e-03],
       [ 1.965e+03,  5.750e-03],
       [ 1.966e+03,  6.190e-03],
       [ 1.967e+03,  5.890e-03],
       [ 1.968e+03,  5.700e-03],
       [ 1.969e+03,  5.820e-03],
       [ 1.970e+03,  5.740e-03]...

Should return two tuples of the form (X_train, y_train), (X_test, y_test). (X_train, y_train) should consist of data from even years and (X_test, y_test) should consist of data from odd years.

My code:

def feat_resp_split(arr):
    X_odd, y_odd = arr[1::2,::].T
    X_even, y_even = arr[::2,::].T
    X_even_train, y_even_train, X_even_test, y_even_test = train_test_split(X_even, y_even, test_size = 0.2, random_state = 42)
    X_odd_train, y_odd_train, X_odd_test, y_odd_test = train_test_split(X_odd, y_odd, test_size = 0.2, random_state = 42)
    return (X_even_train, y_even_train), (X_odd_test, y_odd_test)

Input code:

feat_resp_split(data)

My output:

((array([2003., 1961., 2013., 1987., 1991., 1983., 1995., 1963., 1969.,
         1971., 1965., 2009., 1967., 2007., 2011., 1997., 2017., 2001.,
         1975., 1981., 1989., 1999., 1973.]),
  array([2015., 1993., 1985., 2005., 1977., 1979.])),
 (array([ 0.0358 ,  0.00801,  0.01021, -0.01219,  0.05591,  0.00594,
          0.00574,  0.00673,  0.00619,  0.05787,  0.00131,  0.0057 ,
          0.00589,  0.00213,  0.02137,  0.00461,  0.02254, -0.00117,
          0.01285,  0.0183 ,  0.02076,  0.00473]),
  array([ 0.00193,  0.00513, -0.00436,  0.01773,  0.0142 , -0.00606])))

Expected output:

X_train == array([1962., 1964., 1966., 1968., 1970., 1972., 1974., 1976., 1978.,
       1980., 1982., 1984., 1986., 1988., 1990., 1992., 1994., 1996.,
       1998., 2000., 2002., 2004., 2006., 2008., 2010., 2012., 2014.,
       2016.])
y_train ==  array([ 0.01419604,  0.00594409,  0.00618898,  0.00570149,  0.00573851,
        0.00672948,  0.00473084, -0.00117052, -0.00435676,  0.00193398,
        0.01284528,  0.01020884, -0.00606099, -0.01219414,  0.01830187,
        0.05590975,  0.05787267,  0.03580499,  0.02136897,  0.02076288,
        0.02254085,  0.01772885,  0.00800752,  0.00131397,  0.00212906,
        0.00513459,  0.00589222,  0.00460988])
X_test == array([1961., 1963., 1965., 1967., 1969., 1971., 1973., 1975., 1977.,
       1979., 1981., 1983., 1985., 1987., 1989., 1991., 1993., 1995.,
       1997., 1999., 2001., 2003., 2005., 2007., 2009., 2011., 2013.,
       2015., 2017.])
y_test == array([ 0.02263378,  0.00835927,  0.00575116,  0.00589102,  0.00582331,
        0.00638301,  0.00673463,  0.00213125, -0.0036312 , -0.00204649,
        0.00783746,  0.01395387,  0.00302374, -0.01294617, -0.0007695 ,
        0.03979147,  0.0625632 ,  0.04724902,  0.02705529,  0.01979903,
        0.02250889,  0.02131758,  0.01310552,  0.00384798,  0.00098665,
        0.00377696,  0.00594675,  0.00526037,  0.00421667])    

Solution

  • Given your example data

    data = np.array([[ 1.961e+03,  2.263e-02],
                     [ 1.962e+03,  1.420e-02],
                     [ 1.963e+03,  8.360e-03],
                     [ 1.964e+03,  5.940e-03],
                     [ 1.965e+03,  5.750e-03],
                     [ 1.966e+03,  6.190e-03],
                     [ 1.967e+03,  5.890e-03],
                     [ 1.968e+03,  5.700e-03],
                     [ 1.969e+03,  5.820e-03],
                     [ 1.970e+03,  5.740e-03]])
    

    For your desired output, you only need

    X_train = data[1::2,0]
    y_train = data[1::2,1]
    X_test = data[::2,0]
    y_test = data[::2,1]
    

    I do not understand why you want use sklearn.train_test_split. Do you want to further split the odd and even into 80/20% samples?