I have two separate files, one is a text file, with each line being a single text. The other file contains the class label of that corresponding line. How do I load this into PyTorch and carry out further tokenization, embedding, etc?
What have you tried already? What you described is still not very PyTorch related, you can make a pre-processing script that loads all the sentences into single data structured, e.g.: a list of (text, label) tuple.You can also already split your data into training and hold-out set in this step. You can then dump all this into .csv files.
Then, one way to do it is in 3 steps:
Then you can use this to produce a vector representation of your sentences a pass it to a neural network.
Look into this notebook to understand all this in more detail: