Mastering New Challenges in Text Analytics
This paper briefly defines text analytics, describes various approaches to text analytics, and then focuses on the natural language processing techniques used by text analytics solutions.
Javascript Disabled To use our site, you must enable JavaScript.
Published:
Mar 31, 2009
Type:
White Paper
Length:
24 pages
Technical report
Mastering New
Challenges in Text Analytics
Making unstructured data ready for predictive analytics
Table of contentsIntroduction........................................................................................................................... 2What is text analytics and how is it used?.............................................................................. 3Approaches to understanding text......................................................................................... 4The SPSS text analytics process............................................................................................. 5Applying text analytics at the enterprise level...................................................................... 17Conclusion.......................................................................................................................... 17 SPSS products for text analytics........................................................................................... 18About SPSS Inc.................................................................................................................... 18Appendix A: An explanation of some text analytics terms.................................................... 19Appendix B: Algorithms used for assigning equivalence classes.......................................... 21Appendix C: Examples of Text Link Analysis......................................................................... 22Additional reading on text analytics..................................................................................... 23
SPSS is a registered trademark and the other SPSS products named are trademarks of SPSS Inc. All other names are trademarks of their respective owners. © 2008 SPSS Inc. All rights reserved. MCTWP-0408IntroductionIt's no secret that the world has seen an explosion of information in the past 15 years, an explosion that experts predict will continue as the millions of people who use online resources continue to expand their usage, and the millions of people who do not yet have access to such resources gain it. Similarly, information stored as text in both business and government organizations has grown exponentially.
To name just a few examples: n Opinion surveys are increasingly conducted online and results shared in real time n The boom in software applications supporting sales, customer service, or call center operations has led to massive amounts of text stored electronically in these applications' notes fields n Technology analysts at IDC estimate that 62 billion e-mails are sent every dayn Searchable Web sites generate enough information every day to fill millions of books n Web logs (blogs) and wikis, created by individuals and groups for personal and professional purposes are increasing exponentially: as of this writing, there may be more than 100 million blogs, with a new one created every second
Such a vast expansion of the scale of global information exchange would have been almost unimaginable 40 years ago, when most business and government communications, as well as news reports and advertising, were paper-based.
Yet it was 40 years ago that visionary researchers began to seek ways to enrich the knowledge of those working in medicine and other sciences, in government agencies, and in business by making it possible to uncover previously unknown connections in large collections of textual documents by using computer technologies. They created the discipline known as computational linguistics, which is now practiced at numerous universities and public and private research centers worldwide. Computational linguists initially focused their efforts on finding ways to categorize and explore concepts found in books, scholarly journals, legal briefs, patent applications, newspapers, reports, and other paper-based records that could be converted to digital formats. More recently, their efforts have expanded to include ways to "mine" the vast amount of textual information that is published digitally-online editions of newspapers, academic journals, and conference proceeding, for example. In addition, there is a wealth of content that originates in digital form-such as Web sites, blogs, wikis, e-mails, instant messaging (IM), as well as text embedded in forms, surveys, and in scientific, government, or corporate databases.
There is a growing recognition t... [download for more]