data-prepartion
properly preparing the data to ensure that it is clean, consistent, and ready for analysis.
data-cleaningdata-cleaningMeaning to identify and correct errors. * Handling missing data * Ignore if insignificant * Fill with a global constant (such as “Unknown”, “N/A”, etc.) * Fill with mean or median * Fill with most probably value taken from similar data points (using decision trees or Bayesian methods) * Smoothing noisy data * binning * regression * clustering Status: #idea Tags: data-mining, kdd, data-prepartion References
data-integrationdata-integrationMeaning to combine data from multiple sources into a unified dataset while ensuring consistency and resolves conflicts from merging. * Schema Integration : making sure that the format and structure of data are the same across all sources. * Entity Identification : linking together entries that represent the same thing, even if they have different names. In other words, it can be said that in many cases, entity identification happens before schema integration. 1. First, you need to know which
data-transformationdata-transformationMeaning to convert the data into a suitable format for mining. * Normalization : Scaling data to fit within a specific range (e.g., between 0 and 1). * Discretization \\\\: Dividing continuous attributes into intervals or categories. * Attribute/feature construction : Creating new attributes from existing ones to improve the mining process. Status: #idea Tags: data-mining, kdd, data-prepartion References
data-reductiondata-reductionMeaning to reduce the volume of data while maintaining its integrity. This is important because large datasets can be time-consuming and expensive to analyze. Dimensionality reduction** : Removing irrelevant or redundant attributes. Numerosity reduction** : Using methods such as regression or clustering to summarize data into fewer data points. Data compression** : Reducing the size of the dataset without losing important information. Status: #idea Tags: data-mining, kdd, data-prepartion Refe
Status: #idea
Tags: data-miningData Mining* [x] data-mining-uts-quiz
knowdledge discovery in databases
data-warehousing
schema
400
400
400
Apriori Algorithm
400
Step 1: Count Distinct Items
400
400
400
400
Step 2: Identify Association Rules
400
400
400
400
FP Growth Algorithm
Step 1: Count Distinct Items
400
Step 2: Rearrange Items based count in descending order
400
Step 3: Make FP Growth Tree
1. Make Null Root Node
1. And make children sequentially
400
400
400
400
400
400
400
400
400
400, kddkdddata-prepartion
data mining
pattern-evaluation
knowledge-presentation
Status: #idea
Tags: data-mining
References