In the age of digital transformation, an important part of information that can be used by companies can be derived from unstructured data stored in business documents (word, power point, and pdf), on the web (websites, blogs, and online communities), or on social media. These data constitute an important information pool for understanding the environmental phenomena and dynamics affecting the life of organizations. However, it is mostly textual information, and in an unstructured format. Moreover, it is information that has all the characteristics of speed, variety, and quantity that characterize big data.
This course addresses the technical challenges for organizations and the application tools to automatically identify, collect, extract, and analyze unstructured data sources from Web pages and social media.
The course theoretically introduces the fundamental organizational processes-sense making, decision making, and knowing-and the fundamentals of qualitative analysis and automatic text analysis. The course also provides theoretical knowledge about the mechanisms of operation of web technologies including: web and html protocols and standards, main mechanisms of operation of social media, APIs for accessing data on social media platforms, main technologies of data interchange between different platforms (csv, json).
For application purposes in the course, students will perform hands-on activities of extracting, manipulating and analyzing data taken from social media or websites. The knowledge, techniques, and tools learned during the course can also be used for the analysis of other types of unstructured data such as reports, documents, or document sources from databases.
During the course, students will be engaged in learning activities (both theoretical and applied) individually and in groups. Participation in the course is intended to stimulate in students the development of the following knowledge and skills.
Knowledge and understanding skills Understand opportunities and limitations of big data, unstructured data, and automatic text analysis. Know the basic organizational processes and the role that information plays within them. Know the theoretical foundations and techniques required for the automatic extraction and analysis of data from unstructured text sources.
Applied knowledge and understanding skills Recognize areas of practical applications of techniques for extracting, manipulating, and analyzing unstructured data. Recognize opportunities and risks of using information from unstructured data sources, the Web, and social media.
Autonomy of judgment. Understand whether and what data from unstructured sources (web and social media) can meet the information needs of individuals or organizations. Understand which structured and unstructured information sources to integrate to meet the information need. Know how to evaluate the information content-and any biases that may be present-within unstructured data. Know how to objectively interpret information obtained from automatic text analysis.
Communication skills. During the course, students will train the ability to present, argue and discuss in public automatic text analysis tools and the interpretation of the results of the analysis performed.
Learning skills Ability to approach the learning process in a fully autonomous and self-managed manner.
The course program is divided into a theoretical part and an application/practical part. In the theoretical part, students will delve into the organizational processes of sense making, decision making, and knowing. They will delve into the technical problems and potentials of unstructured data analysis and automatic text analysis. They will also delve into theoretical knowledge related to the operation of web technologies and social media.
In the application/practical part, students will delve into a range of practical skills involving the use of data collection, manipulation and analysis tools functional to increase the information needs of organizations. The following aspects will be addressed in the application/practical part: - Web scapring and social media mining - Data manipulation: cleaning, coding, decoding and trasforming data - Basic text mining operations: text corpora creation, token extraction, lexicons, roots, lemmas, n-grams, tdm/dtm matrices; - Advanced operations: semantic annotation, extraction of topic, sentiment, emotion from text, classifications