Search code examples
pythonnlpclassification

generating multi classifier training data from document


I am looking for guidance to generate multiple classifier training data from document. e.g. if particular document has three sections with each 10 pages in each sections. (total 30 pages)

I am looking for open source library, where I can pass on document (explicitly specifying section 1, section 2 and section 3 pages) then it can give me list of important words to be used as training data to identify "section 1" vs "section 2" vs "section 3". (multiple classification)


Solution

  • I had this quite a long time ago and I am not sure if it will help you at all but a book called "Deep Learning with Python" by François Chollet 2018 could give you some clues in terms of how to generate such data samples from your document. However, the drawback might be that you would have to prepare such a document in a certain way before you can generate data samples. My comment is based on the fact that I have read something about it a long time ago so I could misremember it. Good luck!