Journal of Computers, Vol 3, No 10 (2008), 86-93, Oct 2008
doi:10.4304/jcp.3.10.86-93

Conversation Extraction in Dynamic Text Message Stream

Le Wang, Yan Jia, Yingwen Chen

Abstract


Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency to monitor thousands of continuous chat sessions. Many existing text mining methods encounter challenges. This paper focuses on the conversation extraction in dynamic text message stream. We design the dynamic representation for messages to combine the text content information and linguistic feature in message stream. A memory structure of reversed maximal similar relationship is developed for renewable assignments when grouping messages into conversations. We finally propose a double time window algorithm based on above methods to extract conversations in dynamic text message stream. Experiments on a real dataset shows that our method outperforms two baseline methods introduced in a recent related paper about 47% and 15% in terms of F measure respectively.



Keywords


text message; conversation extraction; content similarity; linguistic feature

References



Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.