The Boundaries of Natural Language Processing Techniques in Extracting Knowledge from Emails
Abstract
The aim of this research is to determine if natural language processing techniques can be used to fully automate the extraction of knowledge from emails. The paper reviews the five generations of building systems to share knowledge and highlights the challenges faced by all. The paper discuss the system built by the authors and shows that although the f-measure results are world leading, there is still a requirement for user intervention to enable the system to be accurate enough to be of use to an organisation.
Keywords
References
Hiltz, S.R. (1985). Online Communities: A Case Study of the Office of the Future, Ablex Publishing Corp, Norwood, NJ.
Lang, K.N., Auld, R. & Lang, T. (1982). "The Goals and Methods of Computer Users", International Journal of Man-Machine Studies, vol. 17, no. 4, pp. 375-399.
Mintzberg, H. (1973). The Nature of Managerial Work, Harper & Row, New York.
Pelz, D.C. & Andrews, F.M. (1966). Scientists in Organizations: Productive Climates for Research and Development, Wiley, New York.
Allen, T. (1977). Managing the Flow of Technology, MIT Press, Cambridge, MA.
Cross, R. & Sproull, L. (2004). "More Than an Answer: Information Relationships for Actionable Knowledge", Organization Science, vol. 15, no. 4, pp. 446-462.
Kraut, R.E. & Streeter, L.A. (1995). "Coordination in Software Development", Communications of the ACM, vol. 38, no. 3, pp. 69-81.
Maltzahn, C. (1995). "Community Help: Discovering Tools and Locating Experts in a Dynamic Environment", CHI '95: Conference Companion on Human Factors in Computing SystemsACM, New York, NY, USA, pp. 260.
Campbell, C.S., Maglio, P.P., Cozzi, A. & Dom, B. (2003). "Expertise Identification Using Email Communications", Twelfth International Conference on Information and Knowledge Management New Orleans, LA, pp. 528.
Bishop, K. (2000). "Heads or Tales: Can Tacit Knowledge Really be Managed", Proceeding of ALIA Biennial Conference Canberra, pp. 23.
Cross, R. & Baird, L. (2000). "Technology is not Enough: Improving Performance by Building Organizational Memory", Sloan Management Review, vol. 41, no. 3, pp. 41-54.
Gibson, R. (1997). Rethinking the Future: Rethinking Business, Principles, Competition, Control & Complexity, Leadership, Markets, and the World, Nicholas Brealey, London.
Lang, J.C. (2001). "Managing in Knowledge-based Competition", Journal of Organizational Change Management, vol. 14, no. 6, pp. 539-553.
Stewart, T.A. (1997). Intellectual Capital: The New Wealth of Organizations, Doubleday, New York, NY, USA.
Cross, R. (2000). "More than an Answer: How Seeking Information Through People Facilitates Knowledge Creation and Use", Toronto, Canada.
Burt, R.S. (1992). Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge.
Erickson, B.H. (1988). "The Relational Basis of Attitudes." in Social Structures: A Network Approach:, Barry Wellman and S. D. Berkowitz (eds.), edn, Cambridge University Press., New York:, pp. 99-121.
Schön, D.A. (1993). "Generative Metaphor: A Perspective on Problem-setting in Social Policy" in Metaphor and Thought, ed. A. Ortony, 2nd edn, Cambridge University Press, Cambridge, pp. 137-163.
Walsh, J.P. (1995). "Managerial and Organizational Cognition: Notes from a Trip down Memory Lane.", Organizational Science, vol. 6, no. 3, pp. 280-321.
Weick, K.E. (1979). The Social Psychology of Organising, 2nd edn, McGraw-Hill, New York.
Weick, K.E. (1995). Sense making in Organisations, Sage, London.
Blau, P.M. (1986). Exchange and Power in Social Life, Transaction Publishers, New Brunswick, NJ.
March, J.G. & Simon, H.A. (1958). Organizations, Wiley, New York.
Lave, J. & Wenger, E. (1991). Situated Learning : Legitimate Peripheral Participation, Cambridge University Press, U.K.
Yimam-Seid, D. and Kobsa, A. (2003) ‘Expert finding systems for organizations: problem and domain analysis and the DEMOIR approach’, Journal of Organizational Computing and Electronic Commerce, Vol. 13, No. 1, pp.1–24.
Ackerman, M.S. and Malone, T.W. (1990) ‘Answer garden: a tool for growing organizational memory’, Proceedings of ACM Conference on Office Information Systems, Cambridge, Massachusetts, pp.31–39.
Ackerman, M.S. (1994) ‘Augmenting the organizational memory: a field study of answer garden’, Proceedings of the ACM Conference on Computer-Supported Cooperative Work, pp.243–252.
Cohen, A.L., Maglio, P.P. and Barrett, R. (1998) ‘The expertise browser: how to leverage distributed organizational knowledge’, Presented at Workshop on Collaborative Information Seeking at CSCW’98, Seattle, Washington.
Krulwich, B. and Burkey, C. (1996a) ‘Learning user information interests through the extraction of semantically significant phrases’, In AAAI 1996 Spring Symposium on Machine Learning in Information Access, Stanford, California.
Balog, K. and de Rijke, M. (2007) ‘Determining expert profiles (with an application to expert finding)’, Proceedings of the Twentieth International Joint Conferences on Artificial Intelligence, Hyderabad, India, pp.2657–2662.
Maybury, M., D’Amore, R. and House, D. (2002) ‘Awareness of organizational expertise’, International Journal of Human-Computer Interaction, Vol. 14, No. 2, pp.199–217.
Streeter, L.A. and Lochbaum, K.E. (1988) ‘An expert/expert-locating system based on automatic representation of semantic structure’, Proceedings of the Fourth Conference on Artificial Intelligence Applications, San Diego, California, pp.345–349.
Mattox, D., Maybury, M. and Morey, D. (1999) ‘Enterprise expert and knowledge discovery’, Proceedings of the 8th International Conference on Human-Computer Interaction, Munich, Germany, pp.303–307.
Luhn HP. 1958. The automatic creation of literature abstracts. I.B.M. Journal of Research and Development, 2 (2), 159-165.
Marsh E, Hamburger H, Grishman R. 1984. A production rule system for message summarization. In AAAI-84, Proceedings of the American Association for Artificial Intelligence, pp. 243-246. Cambridge, MA: AAAI Press/MIT Press.
Paice CD. 1990. Constructing literature abstracts by computer: Techniques and prospects. Information Processing and Management, 26 (1), 171-186.
Paice CD, Jones PA. 1993. The identification of important concepts in highly structured technical papers. SIGIR-93: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 69-78, New York: ACM.
Johnson FC, Paice CD, Black WJ, Neal AP. 1993. The application of linguistic processing to automatic abstract generation. Journal of Document and Text Management 1, 215-241.
Salton G, Allan J, Buckley C, Singhal A. 1994. Automatic analysis, theme generation, and summarization of machine-readable texts. Science, 264, 1421-1426.
Kupiec J, Pedersen J, Chen F. 1995. A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68-73, New York: ACM.
Brandow R, Mitze K, Rau LR. 1995. The automatic condensation of electronic publications by sentence selection. Information Processing and Management, 31 (5), 675-685.
Jang DH, Myaeng SH. 1997. Development of a document summarization system for effective information services. RIAO 97 Conference Proceedings: Computer-Assisted Information Searching on Internet; 101-111. Montreal, Canada.
Tzoukermann E, Muresan S, Klavans JL. 2001. GIST-IT: Summarizing Email using Linguistic Knowledge and Machine Learning. In Proceeding of the HLT and KM Workshop, EACL/ACL.
Tedmori, S., Jackson, T.W. and Bouchlaghem, D. (2006) ‘Locating knowledge sources through keyphrase extraction’, Knowledge and Process Management, Vol. 13, No. 2, pp.100–107.
Hulth A. 2003. Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'03). Sapporo.
Turney PD. 1997. Extraction of Keyphrases from Text: Evaluation of Four Algorithms, National Research Council, Institute for Information Technology, Technical Report ERB-1051. (NRC #41550)
Lichtenstein, S. 2004, "Knowledge Development and Creation in Email", The 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 8IEEE Computer Society, Washington, DC, USA.
Garcia, I. 2006, 04/03-last update, The Good in E-mail (or why Email is Still the Most adopted Collaboration Tool) [Homepage of Central Desktop], [Online]. Available: http://blog.centraldesktop.com/?entry=entry060403-214628 [2011, 06/22].
Pew Internet & American Life Project 2007, 01/11-last update, Pew Internet & American Life Project Surveys, March 2000-April 2006 [Homepage of Pew Research Center], [Online]. Available: http://www.pewinternet.org/trends/Internet_Activities_1.11.07.htm [2007, 11/27] .
Derrington, S. 2006, 11/11-last update, Getting Control of the Storage Environment, [Online]. Available: http://www.datastorageconnection.com/article.mvc/Gaining-Control-Of-The-Storage-Environment-iB-0002 [2011, 06/22] .
Jackson, T. & Burgess, A. 2003, "Capturing and managing email knowledge.", Business Innovation in the Knowledge Economy - Abstracts from the IBM & Stratford-Upon-Avon Conference, eds. J. Abbot, L. Martin, R. Palmer, M. Stone & L.T. Wright, , pp. 28.
Whittaker, S., Bellotti, V. & Moody, P. 2005, "Introduction to this special issue on revisiting and reinventing e-mail", Human-Computer Interaction, vol. 20, pp. 1-9.
Ducheneaut, N. & Belloti, V. 2003, "Ceci n’est pas un objet? Talking about objects in e-mail", Human-Computer Interaction, vol. 18, pp. 85-110.
Jackson, T.W. & Tedmori, S. 2004, "Capturing and Managing Electronic Knowledge: The Development of the Email Knowledge Extraction", The International Resource Management Association ConferenceIdea Group, New Orleans, USA, pp. 463.
Bontis, N., Fearon, M. & Hishon, M. (2003). "The e-Flow Audit: An Evaluation of Knowledge Flow Within and Outside a High-tech Firm", Journal of Knowledge Management, vol. 7, no. 1, pp. 6-19.
Tedmori, S., Jackson, T.W., Bouchlaghem, N.M. & Nagaraju, R. 2006, "Expertise Profiling: Is Email Used to Generate, Organise, Share, or Leverage Knowledge", , eds. H. Rivard, E. Miresco & H. Melham, , pp. 179.
Swaak, J., de Jong, T. & van Joolingen, W.R. 2004, "The effects of discovery learning and expository instruction on the acquisition of definitional and intuitive knowledge", Journal of COmputer Assisted Learning, vol. 20, no. 4, pp. 225-234.
Campbell, C.S., Maglio, P.P., Cozzi, A. & Dom, B. 2003, "Expertise identification using email communications", twelfth international conference on Information and knowledge managementNew Orleans, LA, pp. 528.
Shanteau, J. & Stewart, T.R. 1992, "Why study expert decision making? Some historical perspectives and comments", Organizational Behavior and Human Decision Processes, vol. 53, no. 2, pp. 95-106.
Klimt, B. & Yang, Y. 2004, "Introducing the Enron Corpus", First Conference on Email and Anti-spam.
Markus, M.L. 2001, "Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success", Journal of Management Information Systems, vol. 18, no. 1, pp. 57-93.
Lichtenstein, S., Tedmori, S. and Jackson, T.W., 2008, ''Socio-ethical issues for expertise location from electronic mail'', International Journal of Knowledge and Learning, 4(1), 58-74.
Lin, C., Cao, N., Liu, S., Papadimitriou, S., Sun, J., and Yan, X. 2009, “SmallBlue:Social Network Analysis for Expertise Search and Collective Intelligence”, IEEE 25th International Conference on Data Engineering, pp. 1483-1486.
Full Text: PDF


