Home | Publier un mémoire | Une page au hasard

The impact of covid-19: to predict the breaking point of the disease from big data by neural networks


par Woohyun SHIN
Paris School of Business - MSc Data Management 2001
Dans la categorie: Informatique et Télécommunications > Intelligence artificielle
   
Télécharger le fichier original

précédent sommaire suivant

1. DEVELOPMENT ENVIRONMENT

For climate data, a framework to handle BigData is needed because much of the data at the meteorological station is organized in semi-format format. Therefore, instead of using the BigData service already available online, the company decided to create and use the environment personally. Hadoop connects several equipment in a parallel structure to create a single clustering[Figure 2]. Spark is a framework that uses BigData data stored in Hadoop Filesystem (hdfs) to help preprocess and deep learning analysis within the cluster

- 14 -

environment. The clustering are all managed by a node manager named YARN. Docker was used to easily install and deploy Hadoop and Spark on several physical machines. Zeppelin is an Integrated Development Environment that can be used in Spark, which helps you conduct analysis in the Web interface [Figure 1]. Finally, Elephas, the external package of Spark Deep-Learning, was installed in the cluster and weather prediction modeling was conducted

through deep learning.

Raspberry Pi 4

Arm 64 bits

Ubuntu

18.04 LTS

Hadoop

2.7.7.

Spark

2.4.0

Zeppelin

0.8.2

Python

3.6.9

[Table 2] Development Environment Version

[Figure 1] Logical Architecture

- 15 -

[Figure 2] Physical Architecture

2. DATA

2.1. Daily COVID-19 confirmed cases

Prior to the weather forecasting model, we identified the frequency distribution of the number of confirmed cases by date in the ongoing COVID-19. First, we collected COVID19 datasets from Wikipedia[17] and the Johns Hopkins Coronavirus Resource Center[16]. Afterwards, visualizations were carried out by organizing the dataset by date. In Iran, subsequent data were not available because the number of confirmed cases was not counted by cities since March 26, but only the total number of confirmed cases was announced. The graph below is a graph of new confirmed numbers that occurred every day, and the graph shows the change in the number of confirmed cases in four cities. It was also possible to specify the day at which the number of confirmed cases was increased.

- 16 -

[Figure 3] Confirmed case Trend at each date

In Seoul, the number of confirmed cases has been on the decline since March 10, when the number of confirmed cases reached 52. Madrid has seen a decline since 3,419 confirmed cases on March 30. Tehran has seen a downward trend since peaking 347 confirmed cases on March 14. San-Francisco, a steady increase in the number of confirmed cases has been seen since early March [Figure 3].

2.2. Max Temperature Data

I wanted to use average temperature data and average humidity data per day, but there were not many weather stations since 1950 and the amount of data was limited. It was also assumed that human-to-human infection of the COVID-19 virus is unlikely to occur because the minimum temperature is generally recorded at night and there is a small floating population at dawn, although daily minimum temperature data were available. For this reason, the study was conducted using daytime maximum temperature data with a large floating population instead of average temperature data and humidity data [18].

précédent sommaire suivant