Overview of Text Mining Techniques
Text Mining Process Phase |
Algorithm |
Selected Question |
Motive |
Techniques |
---|---|---|---|---|
Text Preprocessing phase | Tokenization | How can transform a text into words or text format? | Transferring strings into a single textual token. | White space separation. |
Compound word identification | How can I identify words that have a joint meaning? | Identifying words with a joint meaning that gets lost word | n-grams | |
Normalization and noise reduction | How can I cope with too many variables in my Document‐Term‐Matrix? | Reducing the dimensionality of Document‐Term‐Matrix | Stemming, Lemmatization, Deletion of stop words. infrequent term. | |
Linguistic analysis | How can I identify words with a special meaning or grammatical function? | Tagging of words | Named‐entity recognition, Part‐of‐speech tagging | |
Content Analysis | Dictionary‐based | How can I identify how latent sociological or psychological traits and states are reflected in natural language? | Measuring contextual, psychological, linguistic, or semantic concepts and constructs | Pre‐defined dictionaries and Customized dictionaries |
Algorithmic techniques | How can I assign texts to predefined classes? | Classifying textual entities into predefined categories | Supervised learning techniques such as binary or multi‐class classifiers | |
Algorithmic techniques | How can I group similar documents? | Clustering of textual entities into formerly undefined and unknown | Unsupervised learning techniques such as LDA, k‐means, or non‐negative |
Text Mining in Data Mining
In this article, we will learn about the main process or we should say the basic building block of any NLP-related tasks starting from this stage of basically Text Mining.