python python-3.x nlp text-classification

Identifying Grammatically Correct Nonsense Sentences

I have two files file1.csv and file2.csv. file1.csv contains a stupid sentence in each row. file2.csv identify which column it is (type0 corresponding to 0, type1 corresponding to 1). I want to do a NLP classification task and I know usually how to do it. But in this situation I am bit confused and do not know how to arrange and organize my dataset, so that I can train my sentences and labels. Appreciate if someone give me a hint to progress.

file1.csv in the following format,

id,type0,type1
0,He married to a dinosaur.,He married to a women.
1,She drinks a beer.,She drinks a banana.
2,He lifted a 500 tons.,He lifted a 50kg.

file2.csv in the following format.

id,stupid
0,0
1,1
2,0

My purpose is to classify the stupid sentences.

Solution

Assuming that, 100% of the time, there will be a sentence that is semantically correct, and another that isn't, you can just split the type0 and type1 sentences into 2 different examples and classify them individually, e.g.:

id,type0,type1
0,He married to a dinosaur.,He married to a women.
1,She drinks a beer.,She drinks a banana.
2,He lifted a 500 tons.,He lifted a 50kg.

Becomes:

id,sentence
0,He married to a dinosaur
1,He married to a women.
2,She drinks a beer.
3,She drinks a banana.
4,He lifted a 500 tons.
5,He lifted a 50kg.

However, this won't work if your data contains records where a sentence is slightly less stupid than the other, i.e. there's the actual need to compare both sentences.