Word Frequency Distributions (Text, Speech and Language Technology, Volume 18) Review

Average Reviews:

(More customer reviews)Are you looking to buy Word Frequency Distributions (Text, Speech and Language Technology, Volume 18)? Here is the right place to find the great deals. we can offer discounts of up to 90% on Word Frequency Distributions (Text, Speech and Language Technology, Volume 18). Check out the link below:

>> Click Here to See Compare Prices and Get the Best Offers

Word Frequency Distributions (Text, Speech and Language Technology, Volume 18) Review"This book is an introduction to the statistical analysis of word frequency distributions, intended for linguists, psycholinguists, and researchers working in the field of quantitative stylistics and anyone interested in quantitative aspects of lexical structure... The aim of this book is to make these techniques accessible to nonspecialists." - R. Harald Baayen
Harald Baayen prepares readers to conduct practical analyses of word frequencies in text samples. The book's closing chapter explores several scenarios in depth, including comparing documents of different overall length, identifying authorship of disputed documents, and profiling references to calendar years in newspaper articles. The first five chapters prepare the reader to understand these extended examples using careful explanation, derivation of key formulas, practice exercises (with answers in the back, of course), and numerous "learner-friendly" sample analyses based on the just-complex-enough text of Lewis Carroll's Alice in Wonderland. Good instruction with just the right amount of literary whimsy.
Three points represent the essence of the book's technical content:
The central graphical and statistical representation of word frequency analysis is the "frequency spectrum" defined--adequately for even beginners--on page 8. Frequency spectra are lists of all words in a text sample along with how many times each word appears in the sample. These lists are sorted from most to least frequent words. High frequency words are usually function words such as "the" and "a." Key content words, which best summarize a text's topic, are in middle frequency positions. Word frequency spectra also contain numerous very low frequency words, many appearing only once. The tail of low frequency words stretches off to seeming infinity. From a sampling perspective, this long tail is made theoretically longer by the virtual existence of "zero" frequency words which would achieve a frequency of "one" if the text sample size were sufficiently increased. The first chapter further defines frequency spectra and presents other concepts basic to word frequency analysis.
The long tail property underlies the first problem in frequency spectra analysis. The number of unique words in a text sample systematically increases with the word length of the sample. This introduces bias into statistical procedures which compare different-sized text samples. This bias distorts analysis at the full range of document lengths, from brief open-ended survey responses to book-length narratives. Baayen demonstrates these biasing effects in commonly used frequency comparison statistics. He then introduces corrective procedures using such information as the "vocabulary growth rate" observed as a text sample size is incrementally increased in length. The well-organized narrative of describing, demonstrating and solving this problem plays out across chapters two, three and four.
A second problem stems from the statistical assumption that words are randomly distributed within text samples. Obviously this is never true in naturally-occurring text. Less obvious is the type of bias this assumption introduces and how to correct for it. Using clever experimentation, Baayen traces the primary biasing effects to our mid-frequency key content words, which tend to cluster in all text, more so in longer documents. He concludes chapter five with recommendations for adjusted procedures that reduce the biasing effects of these "underdispersed" words.
The reader is now prepared to conduct unbiased analysis of word frequency distributions. The software on the included CD implements the improved frequency analyses devised by the author. It requires Linux, but his web site at the University of Alberta contains a downloadable Windows version. More recent research on the properties of word frequency distributions is available from this web site and is worth reading. Readers interested in the wider field of statistical natural language processing will benefit from Manning and Schutze's Foundations of Statistical Natural Language Processing.
I personally found this book quite valuable, even though I do not conduct any of the specific analyses demonstrated. Much of my word frequency work is semi-automated content analysis of open-ended survey responses (primarily using QDA Miner and WordStat from Provalis Research). Baayen's book has taught me useful corrections that improve the accuracy of my analyses. He achieves his stated goal of making his corrected statistical procedures understandable and accessible to nonlinguists.Word Frequency Distributions (Text, Speech and Language Technology, Volume 18) Overview

Want to learn more information about Word Frequency Distributions (Text, Speech and Language Technology, Volume 18)?

>> Click Here to See All Customer Reviews & Ratings Now

Irish Language Translation

Word Frequency Distributions (Text, Speech and Language Technology, Volume 18) Review

0 comments:

Post a Comment

CATEGORY