Here is a simple way to classify text without much human effort and get a impressive performance. It can be divided into two steps: Get train data by using keyword classification Generate a more accurate classification model by using doc2vec and label spreading Keyword-based Classification Keyword based classification is a simple but effective method. Extracting the target keyword is a monotonous work. I use this method to automatic extract keyword candidate.
Here are some parameter in gensim’s doc2vec class. window window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead. In skip-gram model, if the window size is 2, the training samples will be this:(the blue word is the input word) min_count If the word appears less than this value, it will be skipped