Apply now

The power of technology during this outbreak

The new coronavirus outbreak is going global. Emerging technologies such as big data are seen as having played a notable part in preventing and containing the spread of the novel coronavirus which has raged across China.

The power of technology during this outbreak
Share article
How technology is helping fight 
the impact of the coronavirus? 

What’s the tool and method behind them? 

Today let’s dive into five examples 
that are being used right now.

AutoNavi
As China’s workforce is resuming work and production in the following weeks, one of the top concerns of most people is the number of people on public transportation. 

According to Autonavi(known in Chinese as Gaode Ditu), in addition to the real-time traffic flow of subway lines and passengers, it plans to launch more real-time traffic information in the near future to help users make better decisions on their travel arrangements. 

Functionality: 
real-time traffic density tracking on subways/train


Dataset: 
data is provided by the Beijing Municipal Transportation Commission and has covered all subway lines and stations in the city. Integrate with Map API for GIS location mapping


Method:

Technology tool used:
real-time streaming (most likely, depends on its internal tech stack)

Data pipeline:
Messaging queue for Producer and Consumer (Kafka), SparkStreaming(realtime analysis, like aggregation for traffic count), Datastore for SQL can use Hbase, other in-house tools.


MIT researchers used a machine-learning algorithm to identify a drug called halicin that kills many strains of bacteria. Halicin (top row) prevented the development of antibiotic resistance in E. coli, while ciprofloxacin (bottom row) did not. (Image from courtesy of the Collins Lab at MIT)
Functionality:
developed machine-learning computer models that can be trained to analyze the molecular structures of compounds and correlate them with particular traits, such as the ability to kill bacteria.

Method:
This data-driven prediction for building a model to enable drugs to kill bacteria is a classification problem, the random forest is a good baseline and SVM can be one of the candidate algorithms. A random forest is an ensemble of regression trees applied to bootstrapped versions of the training data. 

Typical machine learning applications set the classification threshold to choose majority vote(means overall trees in the forest), i.e. a classification threshold at 0.8. plot the receiver operating curve(ROC). Receiver operating curve plots of the true positive rate against the false-positive rate as the classification threshold is varied from 0 to1. The closer the receiver operating curve is to the top-left corner, the better the prediction quality. A common metric by which to measure overall prediction accuracy is, therefore, the area under the ROC, the AUC. 

This application uses a library of about 6,000 compounds from Broad Institute’s Drug Repurposing Hub as a test dataset and 100 million molecules selected from the AINC15 database. In normal cases, if the dataset is relatively in high dimension with more features, the techniques feature engineering can be used to see which variables are the more important predictors. Scikit Learn library provides API to get feature importance scores for all attributes based on the correlation between features and class.

Security staff members check passengers’ temperature at Jinggangshan Airport in Ji’an, East China’s Jiangxi province, Feb 10, 2020. The airport has taken measures such as increasing disinfection frequency and testing passengers’ temperature to curb the spread of the novel coronavirus. Photo from Xinhua.
Functionality:
screening and identification recognition and body temperature warning system. It’s a typical computer vision application combined with thermography for temperature monitoring for the human body.

Dataset: 
FLIR dataset

Tool:
Thermal camera named FLIR (Forward Looking Infrared)

Algorithm: 
YOLO can be used for object detection; darknet is a good candidate.

The training model is highly recommended first using transfer learning, which is more industrial style, only the big Lab will more likely train their own model from scratch. For it saves time and cost when training customized datasets on the trained models (onImageNet) by using pre-trained convolution weights. TensorFlow and its model zone provide the pre-trained algorithms, which can be one of the options to implement in model training and inference stage. The basic CNN is the foundation for this model to generate the neuron, convolution layer, and dense layers.

After training, the test dataset is used to check the mAP and IOU score. The train, test step needs to iterate till the meet model performance (the business problem defined before project starts, normally the PoC was built on that phase, so the model should meet the objective set on that stage). The deployment for the model to application normally uses the AWS deployment tool.


Author

Chloe Ji
A self-taught programmer and mainly code in Python, also code in JavaScript and new to Scala. Currently, she is working as a data scientist in industry Blockchain, previously worked in a Computer vision task. She is interested in open source projects and big data, crazy biker in town.
Solomon Soh

Solomon Soh is a Data Science Consultant for UpLevel Singapore. He specializes in operational, marketing and financial analysis with a strong flair in applying ML and RL models to derive consumers’ behaviors. An ex-management consultant, he appreciates the importance of applying the DS right to solve business or societal problems.


At this moment, hundreds of data scientists from around the world are working on data science projects about the coronavirus. 

Stay healthy and positive, spring is coming soon!
Solomon Soh is a Data Science Consultant for UpLevel Singapore. He specializes in operational, marketing and financial analysis with a strong flair in applying ML and RL models to derive consumers’ behaviors. An ex-management consultant, he appreciates the importance of applying the DS right to solve business or societal problems.



At this moment, hundreds of data scientists from around the world are working on data science projects about the coronavirus. 

Stay healthy and positive, spring is coming soon!


Want to know more about Le Wagon's 9-week bootcamp?
Download Syllabus
Keep reading
Learn to code

What job skills are most popular in 2020?

For those of you who are working at home, are you feeling anxious and confused at this moment? What kind of talents are in high demand in 2020? How to prevent losing your job? What skills do you need to learn and improve?

Learn to code

Transition Into Tech: Adam Kalimi

The tech industry continues to grow, and the jobs it offers can be attractive. But are they totally out of reach if you don't already have a background working in the industry?

Graduate stories

Le Wagon taught me that you can change your life if you have the right mindset

Right now I am working at Mumsnet as a Junior Ruby Developer, which is absolutely amazing. I was luckily able to find a job just 2 weeks after the course!

Interested in joining the #1 ranked coding bootcamp?

We are in 38 cities worldwide.