Knowledge and understanding: The aim of Multivariate Statistical Analysis and Data Mining is to provide advanced tools for the analysis of multivariate statistical data and data mining.
Applying knowledge and understanding At the end of the course the student will be able to: know and use the most appropriate techniques to make decisions based on empirical data, gathering to real events and being able to extract important information coming from observed data.
Making judgements At the end of the course the student will be able to analyze the multivariate data with the most appropriate methodologies of Multivariate Statistics and data mining.
Communication skills At the end of the course the student will have acquired the basic knowledge to understand the nature and interpret the associations between the different multivariate statistical phenomena
Learning skills At the end of the course the student will have the knowledge of multivariate statistics and data mining to be able to carry out advanced courses in data science.
1. Data: structure and manipulation • Statistical data • The IT data • Data manipulation: random and systematic errors; missing data; outlier; logical inconsistencies data; anomalous data; imputation technique for missing data.
2. The syntheses of the data, their transformations 3. multivariate distributions: • multinormal distribution • multinomial distribution
4. Unsupervised classification (cluster analysis) - Non-hierarchical methods: K-means, Pam and fuzzy K-madoidi - variables Selection, best selection of the number of clusters, - The aggregative hierarchical methods
5. Data reduction methodologies: • principal component analysis (PCA); • factor analysis • correspondence analysis and multiple correspondence analysis