ANALISI E GESTIONE DEI BIG DATA PER L'INFORMAZIONE
(objectives)
The course aims to provide students with the necessary tools to independently produce analytical elaborations based on data retrieved from the internet (data journalism), starting from data available on the web, provided by institutions in open format, or obtained through scraping techniques. The course includes an introduction to programming in Python, database basics, and querying methods, to enable the importation of data from different sources and formats, their exploration, and cleaning for subsequent analysis, using key statistical measures
. The analyzed data will be visualized through Tableau to create interactive infographics. The course will examine ways to communicate data more immediately and effectively, in terms of graph style, color contrasts, fonts, and interaction methods, considering that good graphic style is essential for both data readability and the ability to discover causal relationships within them (data discovery). To facilitate the management of large amounts of data (Big Data) and to extract information from them, the main features of the Apache Spark Python framework in combination with Spark SQL will be illustrated, as well as the use of the framework in combination with visualization tools.
Additionally, students will learn how to use AI tools, especially GitHub Copilot, as programming assistants to produce analyses through prompts in natural language.
Knowledge and understanding: Knowledge related to programming with high-level language to perform data analysis tasks using basic statistical measures. Utilization of knowledge and understanding: Students will learn how to perform a data analysis task: data collection (on the chosen topic), data cleaning, data transformation, or analysis using statistical measures and machine learning algorithms, visualization of results with Tableau. Autonomy of judgment: As communication experts, students will be able to evaluate the results of an analysis and understand the initial data. Communicative skills: Students will learn the appropriate terminology to communicate with domain experts for analysis (with Python and Spark) and data visualization tools. Learning ability: Students will be able to apply analysis algorithms by developing computational thinking and apply these algorithms in their work.
|