Memoire Online - The impact of covid-19: to predict the breaking point of the disease from big data by neural networks

The transformer is a function that converts raw data in various ways. This could be creating a new interaction variable, normalizing a column, or changing an Integer type to a Double type in order to enter the model.

3.2.3.2.Estimator

The estimator has two meanings. First, it means a kind of transducer that initializes the data. For example, to normalize numeric data, the transformation is initialized using the current value information in the column you want to normalize. Second, the algorithms used by users to learn the model from the data are also called estimators.

3.2.3.3.Evaluator

criterion, like the Receiver Operating Characters (ROC) curve. After selecting the best model among the models tested using the Evaluator, the final prediction can be made using that model.

3.2.3.4.External libraries

Sparks can run various projects using external libraries as well as embedded packages. Among them, a variety of external deep learning libraries can be used, especially in the new field, such as TensorFrame, BigDL, TensorFlowOnSpark, DeepLearning4J, and Elephas. There are two ways to develop a new deep learning model. One is to use a spark cluster to parallelize learning on a single model in multiple servings and update the final results through communication between each server. The other is how to use a specific library to learn various model objects in parallel and to review various model architectures and hyperparameters to efficiently select and optimize the final model.

Library	Framework based on DL	Case of application
TensorFrame	Tensorflow	Inference, Transfer learning
BigDL	BigDL	Distributed learning, Inference
TensorFlowOnSpark	Tensorflow	Distributed learning
DeepLearning4J	DeepLearning4J	Inference, Transfer learning, Distributed learning
Elephas	Keras	Distributed learning

Elephas is a library designed to run the Keras Deep Learning Framework in Spark. Keras maintains simplicity and high usability to support distributed models that can be run on large datasets. Using Spark's RDD and Dataframe, it is implemented on Keras as a class of data parallel algorithms, initialized from the Driver of the Spark, then serialized the data and passed to the Executor, and the parameters needed for the Model are passed from the Executor using the distributed shared variables Broadcast Variable and Accumulator of the Spark. Subsequently, the learned data and hyperparameters are passed back to the Driver. These values are synced updater by Optimizer in the Master of Node and continue learning.

3.3. Docker

Docker is an open-source project that makes it easier to use applications as containers by adding multiple features of Linux containers. Docker is written in Go language. Unlike virtual machines, which are traditional methods of virtualization, Docker containers have little performance loss, drawing attention from many developers in next-generation cloud infrastructure solutions. There are many projects related to Docker, including Docker Compose, Private Registry, Docker Machine, Kitemetic, and so on, but typically Docker refers to the Docker Engine. The Docker Engine is the main project of the Docker that creates and manages containers and provides a variety of functions and controls the containers on its own [7].

Traditional virtualization technology used Hypervisor to create and use multiple operating systems on a single host. These operating systems are identified as virtual machines, and each virtual machine has Ubuntu, CentOS, and so on. Operating systems created and managed by Hypervisor use independent space and system resources that are completely different from each guest operating system. Typical virtualization tools for this approach include VirtualBox, VMware, and others. However, virtualizing machines and creating independent space is a must-have hypervisor, resulting in performance loss compared to normal hosts. So while virtual machines have the advantage of creating a complete operating system, they have the potential to lose performance compared to typical hosts, and it's hard to deploy gigabytes of virtual machine images to applications.

In comparison, the Docker container has little performance loss because it creates a process-level isolation environment by using Linux's own features, chroot, namespace and cgroups, to create virtualized space. Because the necessary kernel for the container shares and uses the kernel on the host, and there are only libraries and executable files in the container that are needed to run the application, the image capacity is also significantly reduced when the container is imaged. This is faster than virtual machines and has the advantage of having little performance loss when using virtualized space.

The impact of covid-19: to predict the breaking point of the disease from big data by neural networks

3.2.3.1.Transformer

3.2.3.2.Estimator

3.2.3.3.Evaluator

3.2.3.4.External libraries

3.3. Docker