Representation of concept hierarchies
Concept hierarchies help in the precise data mining process. This works based on the relationships and the grouping of data. This concept hierarchy must be flexible to make changes dynamically when new datasets are encountered.
- At schema level:
Using the relationships between the attributes of a dataset, the conceptual hierarchy can be specified at the schema level.
Query:
define hierarchy student_result_hierarchy on marks as [year, department, class, section]
In the above example, the attribute department is more general than the student’s year but less general than the class and section in which the student studies. Now take an example of built-in hierarchy at schema level:
Query:
define hierarchy period_hierarchy on date as [day, month, year]
- By set grouping:
In this, the hierarchy is specified based on concept grouping techniques which provide the obvious appearance of lower and higher levels.
define hierarchy age_hierarchy for book on audience as level1: {children, young_adult, adult} < level0: all level2:{8, ..., 12} <level1: children level3:{13, ..., 18} <level1: young_adult level4:{19, ..., 100} <level1: adult
In the above example, the category of a book is grouped with age sets or ranges.
- Operation derived hierarchy:
In this hierarchy, the data is in form of numerical attributes. It can be done by comparing ranges or even by clustering using data algorithms.
define hierarchy age_hierarchy for book on audience as {age_group(1), ..., age_group(3)} := cluster(default, age, 3) < all(age)
- Rule-based hierarchy:
In this hierarchy, the specification is done based on rules. There will be a small number of rules at the lower level and increases in higher levels.
define hierarchy book_royalty_hierarchy on book as level1: low_royalty < level0: all if ( maximum_selling_price)< Rs. 300 level_1: moderate-royalty < level_0: all if ((maximum_selling_price) > Rs. 300) and ((maximum_selling_price) ≤ Rs. 275)) level_1: high_royalty < level_0: all
In the above example, the royalty that the author gets for his book based on the range of Maximum Selling Price (MRP) is explained using a rule-based hierarchy.
Data Mining Query Language
Data Mining is a process is in which user data are extracted and processed from a heap of unprocessed raw data. By aggregating these datasets into a summarized format, many problems arising in finance, marketing, and many other fields can be solved. In the modern world with enormous data, Data Mining is one of the growing fields of technology that acts as an application in many industries we depend on in our life. Many developments and researches have been held in this field and many systems are also been disclosed. Since there are numerous processes and functions to be done in Data Mining, a very well developed user interface is needed. Even though there are many well-developed user interfaces for the relational systems, Han, Fu, Wang, et al. proposed the Data Mining Query Language(DMQL) to further build more developmental systems and innovate many kinds of research in this field. Though we can’t consider DMQL as a standard language. It is a derived language that stands as a general query language to perform data mining techniques. DMQL is executed in DB miner systems for collecting data from several layers of databases.