ANALISI E GESTIONE DEI BIG DATA PER L'INFORMAZIONE
(objectives)
The course aims to provide students the tools to produce analytical reports based on data found on internet (data journalism), starting from the open data provided by the PAs. This course provides an introduction to: Python programming, SQL and NoSQL databases, the main features of the Apache Spark Python framework in combination with Spark SQL, and the use of the framework in combination with SQL databases and NoSQL and visualization tools. With this tools, the students can import data from different sources and formats, to explore and clean for subsequent analysis, using the main statistical measures, also to produce infographics (with Tableau and Google Data Studio).
Knowledge and understanding: Knowledge of computer programming with a high-level programming language to develop data analysis tasks using basic statistical measures. Applying knowledge and understanding: The student will learn how to carry out a data analysis task: data collection (of the chosen topic), data cleaning, data transformation (data analysis) using statistical measures and machine learning algorithms, visualization of the outputs with Tableau. Making judgment: The student, as a communication expert, will be able to evaluate the results of an analysis and understand the starting data. Communication skills: The student will learn the right terminology to communicate with the domain experts to develop the requested analysis (with Python and Spark) and to visualize the outputs (with the data visualization tools available). Learning skills: The student will be able to apply analysis algorithms (developing the computational thinking) and apply them in his work.
|
Code
|
18515 |
Language
|
ITA |
Type of certificate
|
Profit certificate
|
Credits
|
8
|
Scientific Disciplinary Sector Code
|
INF/01
|
Contact Hours
|
48
|
Type of Activity
|
Core compulsory activities
|
Teacher
|
Pasquini Daniele
(syllabus)
Introduction to: data journalism, open data, Python programming for data analysis, SQL lang. How to make analysis on big data through Apache Spark framework, data viz with Tableau and Google Data Studio
(reference books)
Slide del docente, libro PDF (gratuito) Pensare in Python, (consigliato) Learning Spark, 2nd Edition (Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee) edito da O’Reilly.
|
Dates of beginning and end of teaching activities
|
From 04/10/2021 to 07/01/2022 |
Delivery mode
|
Traditional
At a distance
|
Attendance
|
not mandatory
|
Evaluation methods
|
Written test
Oral exam
A project evaluation
|
Note
|
CdD 16.04.21 disciplina a contratto, CCS LM-91 14.05.21 rinnovo contratto Pasquii, 17.05.21 accettazione rinnovo CdD 17.05.21 delibera finale |
|
|