All Posts

Deploy Nikola Org Mode on Travis

Recently, I enjoy using Spacemacs, so I decided to switch to org file from Markdown for writing blog. After several attempts, I managed to let Travis convert org file to HTML. Here are the steps. Install Org Mode plugin First you need to install Org Mode plugin on your computer following the official guide: Nikola orgmode plugin. Edit conf.el Org Mode will convert to HTML to display on Nikola. Org Mode plugin will call Emacs to do this job.

Using Chinese Characters in Matplotlib

After searching from Google, here is easiest solution. This should also works on other languages: import matplotlib.pyplot as plt %matplotlib inline %config InlineBackend.figure_format = 'retina' import matplotlib.font_manager as fm f = "/System/Library/Fonts/PingFang.ttc" prop = fm.FontProperties(fname=f) plt.title("你好",fontproperties=prop) plt.show() Output:

LSTM and GRU

LSTM The avoid the problem of vanishing gradient and exploding gradient in vanilla RNN, LSTM was published, which can remember information for longer periods of time. Here is the structure of LSTM: The calculate procedure are: \[\begin{aligned} f_t&=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)\\ i_t&=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)\\ o_t&=\sigma(W_o\cdot[h_{t-1},x_t]+b_o)\\ \tilde{C_t}&=tanh(W_C\cdot[h_{t-1},x_t]+b_C)\\ C_t&=f_t\ast C_{t-1}+i_t\ast \tilde{C_t}\\ h_t&=o_t \ast tanh(C_t) \end{aligned}\] \(f_t\),\(i_t\),\(o_t\) are forget gate, input gate and output gate respectively. \(\tilde{C_t}\) is the new memory content. \(C_t\) is cell state.

Models and Architectures in Word2vec

Models CBOW (Continuous Bag of Words) Use the context to predict the probability of current word. Context words’ vectors are \(\upsilon_{c-n} … \upsilon_{c+m}\) (\(m\) is the window size) Context vector \(\hat{\upsilon}=\frac{\upsilon_{c-m}+\upsilon_{c-m+1}+…+\upsilon_{c+m}}{2m}\) Score vector \(z_i = u_i\hat{\upsilon}\), where \(u_i\) is the output vector representation of word \(\omega_i\) Turn scores into probabilities \(\hat{y}=softmax(z)\) We desire probabilities \(\hat{y}\) match the true probabilities \(y\). We use cross entropy \(H(\hat{y},y)\) to measure the distance between these two distributions.

Semi-supervised text classification using doc2vec and label spreading

Here is a simple way to classify text without much human effort and get a impressive performance. It can be divided into two steps: Get train data by using keyword classification Generate a more accurate classification model by using doc2vec and label spreading Keyword-based Classification Keyword based classification is a simple but effective method. Extracting the target keyword is a monotonous work. I use this method to automatic extract keyword candidate.

Parameters in dov2vec

Here are some parameter in gensim’s doc2vec class. window window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead. In skip-gram model, if the window size is 2, the training samples will be this:(the blue word is the input word) min_count If the word appears less than this value, it will be skipped

Brief Introduction of Label Propagation Algorithm

As I said before, I’m working on a text classification project. I use doc2vec to convert text into vectors, then I use LPA to classify the vectors. LPA is a simple, effective semi-supervised algorithm. It can use the density of unlabeled data to find a hyperplane to split the data. Here are the main stop of the algorithm: Let $ (x_1,y1)…(x_l,y_l)$ be labeled data, $Y_L = \{y_1…y_l\} $ are the class labels.

Enable C Extension for gensim on Windows

These days, I’m working on some text classification works, and I use gensim=’s =doc2vec function. When using gensim, it shows this warning message: ``` C extension not loaded for Word2Vec, training will be slow. ``` I search this on Internet and found that gensim has rewrite some part of the code using `cython` rather than `numpy` to get better performance. A compiler is required to enable this feature. I tried to install mingw and add it into the path, but it’s not working.

Some Useful Shell Tools

Here are some shell tools I use, which can boost your productivity. Prezto A zsh configuration framework. Provides auto completion, prompt theme and lots of modules to work with other useful tools. I extremely love the agnoster theme. Fasd Help you to navigate between folders and launch application. Here are the official usage example: ``` v def conf => vim /some/awkward/path/to/type/default.conf j abc => cd /hell/of/a/awkward/path/to/get/to/abcdef m movie => mplayer /whatever/whatever/whatever/awesome_movie.

Start

Over the years, I have read so many programmers’ blogs, which has helped me a lot. Now I think it’s the time to start my own blog. I hope this can enforce myself to review what I have learned, and it would even be better if someone can benefit from it.