How to create a tensorflow dataset from a list of filenames that need to be loaded and transformed and their corresponding labels

Given a list of npy filenames x:

x = ['path/to/file1.npy', 'path/to/file2.npy']

and a list of labels y:

y = [1, 0]

I want to create a tensorflow Dataset that consists of pairs of the labels and the loaded and transformed numpy arrays contained within the npy files.

Constrains

Each npy file must be loaded, the numpy array contained within must undergo an arbitrary transformation (irrelevant to the question) and then the array must be finally added to the Dataset along with its corresponding label.
It is necessary to use a Dataset as the files are too large to be loaded into the memory at once.
The npy files are not all contained in a single directory.

Existing answers and how they don't match for my case:

Tensorflow dataset from lots of .npy files
a) does not offer clear directions on how to construct the mapping function of the loading and b) focuses on a function that only handles arrays and not their corresponding labels.
What is the best way to load data with tf.data.Dataset in memory efficient way
This answer does not provide what I ask about (the mapping function to load both x and y, along with transformations of x) but instead has a placeholder for that function (PARSE_FUNCTION).

Solution

Answering to the comment of the question, you need a tf.py_function wrapper to use non-Tensorflow functions. You can't use non-TF functions directly in the .map method. (most code comes from this question):

def load_files_py(train_filenames, width, height):
   image = np.load(train_filenames)
   image = skimage.transform.resize(array, (height, width))
   return image


def parse_function(image_filenames, labels):
    image = tf.py_function(load_files_py, inp=[image_filenames, width, height], Tout=[tf.float32])
    return image, label



dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(parse_function, num_parallel_calls=PARALLEL_CALLS)

You most likely meant sklearn.transform.resize. sklearn.io.resize does not exist. If you want o preserve the number of channels, you don't give it as an argument.
Note that num_parallel_calls could be useless here, because tf.py_function acquires the python GIL (global interpreter lock), which prevents multithreading.

You could switch out sklearn.transform.resize and use tf.image.resize, or tf.keras.layers.Resize directly in the model. They are practically the same.