Statistics
Data mining has an inherent connection with statistics. It studies the collection, and interpretation performs the analysis and helps visualize data presentation. A statistical model is used for data classes and data modeling. It describes the behavior of an object in a class and its probability. Statistical models are the outcomes of data mining tasks like classification and data characterization. Or we can use the mining task on top of the statistical models.
Advantage:
- Statistics can be used to model noise and missing data values. The tools for forecasting, predicting, or summarizing data can be availed by statistics. Statistics are useful for pattern mining. After mining a classification model, the statistical hypothesis is used for verification. A hypothetical test makes the decisions using the test data. The result is statistically significant if it is not likely to have been incurred by chance.
Disadvantage:
- When the statistical model is used on large data set, it increases the complexity cost. When data mining is used to handle large real-time and streamed data, computation costs increase dramatically.
Technologies Used in Data Mining
Pre-requisites: Data Mining Techniques
Data mining has incorporated many techniques from other domain fields like machine learning, statistics, information retrieval, data warehouse, pattern recognition, algorithms, and high-performance computing. Since it is a highly application-driven domain, the interdisciplinary nature is typically very significant. Research and development in data mining and its applications prove quite useful in implementing it. We will see major technologies utilized in data mining.