Learning Semantic Templates

Learning semantic templates from raw text.

Extraction and Labeling

Inducing slots through the sense unit extraction and labeling.

Pattern Mining and Pattern Determination

Frequent pattern mining for semantic template generation and linguistic pattern determination.

Model for Class Categorization

Semantic-infused Convolutional Neural Network for class categorization.


We proposed a semantic template-based distributed representation for the convolutional neural network called Semantic Template-based Convolutional Neural Network (STCNN) for text categorization that imitates the perceptual behavior of human comprehension. STCNN is a highly automatic approach that learns semantic templates characterizing a domain from raw text and recognizes categories of documents using a semantic-infused convolutional neural network that allows a template to be partially matched through a statistical scoring system. Our experiment results show that STCNN effectively classifies documents in about 140,000 Chinese news articles into predefined categories by capturing the most prominent and expressive patterns and achieves the best performance among all compared methods for Chinese topic classification. Finally, the same knowledge can be directly used to perform a semantic analysis task.

The Topic Detection Dataset

The dataset was composed of Chinese news articles collected from 2010 to 2014, inclusive of news topics, news headlines, and news articles. There are six classes or topic categories in news articles: Sports, Tech, Politics, Travel, Edu, and Health. The data distribution is shown in Table 1 below. From the original dataset with a total of 132,258 observations, 60,000 rows (10,000 for each of the six topics) are selected for the training dataset, and 72,258 rows for the test dataset. Each instance contains a topic category, news title and content.

Table 1. Data distribution of topic detection.


Please cite the following paper if you use this dictionary in your research:

1. Yung-Chun Chang, Yu-Lun Hsieh, Cen-Chieh Chen, and Wen-Lian Hsu, "A Semantic Frame-based Intelligent Agent for Topic Detection," Soft Computing, vol. 21, Issue 2, pp. 392-401, 2017.

2. Yung-Chun Chang, Cen-Chieh Chen, Yu-Lun Hsieh, Chien Chin Chen, and Wen-Lian Hsu*, "Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Resonance Writing Assistance," in proceeding of 53th ACL, 2015.

3. Yung-Chun Chang, Yu-Lun Hsieh, Cen-Chieh Chen, Chad Liu, Chun-Hung Lu, and Wen-Lian Hsu, "Semantic Frame-based Statistical Approach for Topic Detection," in proceeding of the 28th PACLIC, 2014.

Apply for Getting the Dataset

Fill User Form for Application of Dataset