Text mining is a process to extract interesting and significant patterns to explore knowledge from textual data sources. Approximately, 90% of world’s data is held in unstructured format. In the 21st century, unstructured data is growing exponentially. Computational text analysis has become an exciting research field with many applications in communication research. It can be a difficult method to apply, because it requires knowledge of various techniques, and the software required to perform most of these techniques is not readily available in common statistical software packages. This report takes a quick look at how to organize and analyze huge volume of unstructured text data using R programing language. The Coronavirus Corpus data set was used for the evaluation. Different features obtained from the data management part of tokenization, removal of punctuations, stemming and construction of the document-term matrix (DTM) were further used for the analysis. Visualization, finding associations, networks and groups among the extracted features are included in the analysis. Overall, this paper provides a practical demonstration of text mining using a real data set.
Support the magazine and subscribe to the content
This is premium stuff. Subscribe to read the entire article.