KMS Chongqing Institute of Green and Intelligent Technology, CAS
Keyword spotting in handwritten chinese documents using semi-markov conditional random fields | |
Zhang, Heng1; Zhou, Xiang-Dong2; Liu, Cheng-Lin3 | |
2017-02-01 | |
摘要 | This paper proposes a document indexing method for keyword spotting based on semi-Markov conditional random fields (semi-CRFs), which provide a theoretical framework for fusing the information of different contexts. The candidate segmentation-recognition lattice is first augmented based on the linguistic context to improve recognition results. For fast retrieval and to save storage space, the lattice is then purged by a forward backward pruning procedure. In the reduced lattice, we estimate character similarity scores based on the semi-CRF model. The parameters of semi-CRF model are estimated using a binary classification objective, i.e., the cross-entropy (CE) to discriminate candidate characters in the lattice. To locate mis-recognized character instances in the lattice, we use confusing similar characters as proxies and search for proxy-characters in the index file. The proxy-character driven search can significantly improve the performance compared with our previous character-synchronous dynamic search (CSDS) method. Experimental results on the online handwriting database CASIA-OLHWDB justify the effectiveness of the proposed method. |
关键词 | Online handwritten Chinese documents Semi-Markov conditional random fields Keyword spotting Proxy-character driven search |
DOI | 10.1016/j.engappai.2016.11.006 |
发表期刊 | ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE |
ISSN | 0952-1976 |
卷号 | 58页码:49-61 |
收录类别 | SCI |
WOS记录号 | WOS:000392684200004 |
语种 | 英语 |