3 Data Preparation
Preparing the data is one of the most time-consuming parts of any data analysis/data mining project.
3.1 DATA SOURCES
- Surveys or polls
- Experiments
- Observational and other studies
- Operational databases(CRM etc)
- Data warehouses
- Historical databases
- Purchased data
3.2 DATA UNDERSTANDING
- Data Tables
- Continuous and Discrete Variables
- Scales of Measurement(Nominal/Ordinal/IntervalRatio)
- Roles in Analysis(Labels/Descriptors/Response)
- Frequency Distribution
3.3 DATA PREPARATION
- Normalization
- Min-max: $$\acute{value} = \frac{Value - OriginalMin}{OriginalMax - OriginalMin}*(NewMax - NewMin) + NewMin$$
- z-score: $$\acute{value} = \frac{Value - \bar{x}}{s}$$
- Decimal scaling: $$\acute{value} = \frac{Value}{10^n}$$