Covid-19 Big Data Analysis
Period: 04/2022 - 05/2022
Project Name: Covid-19 Big Data Analysis
Report: Checkout this PDF
Cloud Computing Big Data Python Spark SQL PySpark
What's the Project For?
- Course project for the course 'Cloud Computing and Big Data Systems' for the degrees in HKUST.
- Conduct a descriptive analysis of global COVID-19 data through the utilization of cloud computing tools and platforms learnt in class.

Project Description
COVID-19 first identified in 2019 has now fully grown to become a global pandemic that caused vast and irreparable devastation to all parts of the world. Though with the development of vaccination and by doing mask-wearing and social-distancing etc., situations have improved, however, there still a long road ahead of us before we can truly rid ourselves of the disease. It is important for us to learn from the past data, and understand the current situation in order to adopt the best measures to fight against the disease. Hence, our project is to analysis the global COVID-19 data and gain insights from it.
Most Challenging Part of the Project?
I think the most challenging part of the project was to gain insights from the tables and charts obtained through coding and to write the report. It was not difficult to extract the information we wanted from the dataset, because simply by understanding the API could do the job. However, extracting meaningful information and explaining the reasons behind the information we obtained was the hardest part.
In this project, I realized the importance of good English writing skills. I was responsible in the coding part, and I tried to write a draft report just to explain what I've got from my code to my groupmates. They have write another one on top of mine, and their report was far better than mine because they explain everything much clearly, more organized, and with more accurate vocabularies. After this project, I saw the need of further improving my language skills and that was the biggest reward from this project.