PSB PARIS SCHOOL OF BUSINESS
Mémoire
Pour l'obtention du diplôme de
Master of Science
En Data Management
Présenté Par : SHIN
Woohyun
THÈME
The impact of COVID-19 : To predict the
breaking
point of the disease from Big Data by Neural
Networks
Responsable du Mémoire : OMRANI
Nessrine
- 1 -
Année Scolaire : 2019-2020
- 2 -
Abstract
The weather data generated per second is BigData, which is
difficult to process with computers at home. In particular, supercomputers used
by the National Weather Service are more expensive than clusters that connect
multiple computers in parallel with a single machine. To address these
limitations of a single machine, the cluster environment was built using the
BigData framework Hadoop and Spark. Subsequently, a deep learning prediction
model was created using temperature data to predict the reduction point of
COVID-19. The model is designed to put the maximum temperature of the past
decade at each day at input value, and to predict the 2020 weather, hoping for
an early end to COVID-19. As a result, the predicted reduction point of
COVID-19 was consistent with the actual breaking point.
Keywords : BigData, Hadoop, Spark,
Deep-Learning
Les données météorologiques
générées par seconde sont BigData, qui est difficile
à traiter avec des ordinateurs à la maison. En particulier, les
superordinateurs utilisés par le National Weather Service sont plus
chers que les clusters qui connectent plusieurs ordinateurs en parallèle
avec une seule machine. Pour remédier à ces limitations d'une
seule machine, l'environnement de cluster a été
créé à l'aide du framework BigData Hadoop et Spark. Par la
suite, un modèle de prévision d'apprentissage en profondeur a
été créé en utilisant des données de
température pour prédire le point de réduction de
COVID-19. Le modèle est conçu pour mettre la température
maximale de la dernière décennie à chaque jour à la
valeur d'entrée, et pour prédire la météo 2020, en
espérant une fin précoce de COVID-19. Par conséquent, le
point de réduction prévu de COVID-19 était conforme au
point de rupture réel.
Mots clés : BigData, Hadoop, Spark,
Deep-Learning
- 3 -
Table of Contents
I. INTRODUCTION
II. LITERATURE REVIEW
1. Coronavirus
1.1. SARS and MERS
1.2. Seasonal Virus
1.3. 2019 Novel Coronavirus
2. Weather Prediction Model
2.1. Numerical Weather Prediction (NWP)
2.2. Deep Learning Model
3. Applied Technologies
3.1. Hadoop
3.1.1. HDFS
3.1.2. MapReduce
3.1.3. Yarn
3.2. Spark
3.2.1. Low-Level API
3.2.2. Structured API
3.2.3. Machine Learning on Spark
3.3. Docker
3.3.1. Micro-Service
3.3.2. Image and Container
3.3.3. Networks
3.3.4. Kubernetes
4. Conclusion
III. METHOD AND DATA
1. Development Environment
2. Data
2.1. Daily COVID-19 confirmed cases
2.2. Max Temperature Data
3. EDA, Exploratory data analysis
4. Prediction Modeling
4.1. Data Preprocessing
4.2. Multi-Layer Perceptron
IV. Result
V. Discussion
VI. Conclusion
- 4 -
I. INTRODUCTION
Coronavirus, which began to infect humans through bats, has
appeared in various shapes since 2003. The variety of Coronavirus, which
appeared in SARS in 2003, MERS in 2009 and COVID-19 in 2019, has social and
economic implications as well as human health. In particular, this COVID-19 is
a deadly situation, with the WHO issuing a "Pandemic" proclamation. Since the
first confirmed case of COVID19 appeared in Wuhan, China on December 8, 2019,
data from the World Health Organization (WHO) have shown that more than
2,314,621 confirmed cases have been reported worldwide by April 20, 2020. Among
them, 157,847 people died, with a fatality rate of around 7 percent[19].
Fortunately, in some countries with a well-respected COVID-19 medical system,
the number of confirmed cases remains double-digit every day, and the number of
confirmed cases continues to decline. However, the impact of COVID-19 has led
to a setback in global economic and social infrastructure, and a tentative
recovery period is expected, as is the result of the 2008 global financial
crisis. Economically, the U.S. stock market plunged due to the outbreak of the
new coronavirus infection, resulting in massive unemployment within a week. In
Europe, many countries that share borders from Italy to Spain, France and
Germany were closed by COVID-19. This has slowed the growth rate of many
countries tied to the euro this year.
Contrary to this disastrous reality, individual thinking predicts
that these negative effects will create many new opportunities from a long-term
perspective. Historically, many developments and advances are made by the
limited environment of the times. For example, during World War II, there were
many advances in communication and technology that we are currently using, and
the commercialization of penicillin, among medical technologies, was one of the
inventions that changed the world. We are now incorporating and developing
technologies that existed but were not frequently used to fit with the times.
Among them are remote education and telecommuting, grocery shopping using the
Internet, and ordering food. These technologies existed in the past, but did
not feel the need, and lacked the technology until commercialization. However,
it shows that it is growing and adapting to new environment according to the
current situation. Therefore, through temperature and humidity-based neural
network learning, we will predict when the end of COVID-19 proliferation will
come. Predicting the timing, we try to seize the economic opportunities that a
new social culture brought by COVID-19 will bring.
I conducted a search paper to answer the question of when we
would get freedom from the coronavirus - when would economic recovery begin?
First, I will talk about what is coronavirus ? and how we can
predict the weather. I will also talk about what technology is needed to make
climate predictions. The technology and data will then be used to predict the
climate and find a point in time when the corona virus will decrease.
- 5 -
II. LITERATURE REVIEW
Bill Gates, founder of Microsoft and CEO of the Bill &
Melinda Gates Foundation in 2015, predicted a number of infections and economic
declines through the virus during a TED lecture. The disease will kill more
than 10 million people within a few decades, and the route of infection can be
found anywhere, not just in plane and in markets. The World Bank estimates that
if we have a worldwide flu epidemic, global wealth will go down by over three
trillion dollars and we'd have millions and millions of deaths [1].
1. CORONAVIRUS
Coronavirus (CoV) is a virus that can be infected with humans
and various animals, meaning the RNA virus with a gene size of 27 to 32 kb [8].
There are four types of coronavirus (Alba, Beta, Gamma, Delta), and in the case
of Alpha and Beta, Gamma and Delta can infect humans and animals, meaning that
they can be infected by animals. So far, a total of six types of human-infected
coronavirus have been known. There are types that cause colds (229E, OC43,
NL43,HKU1) and types that can cause severe pneumonia (SARS-CoV, MERS-CoV)
[8].
1.1. SARS and MERS
The emergence of severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2) in China at the end of 2019 is a one of kind of coronavirus from
the bat. Phylogenetic analysis revealed that SARS-CoV-2 is 79% similar to
SARS-CoV, which occurred in China in 2003, and 50% similar to MERS-CoV [4].
|