China Naming Network - Eight-character fortune telling - The accuracy was once 100%! Tsinghua alumnus predicted the epidemic in the United States_Scientific Invention

The accuracy was once 100%! Tsinghua alumnus predicted the epidemic in the United States_Scientific Invention

Life is not easy, the editor sighs. I can only comfort myself by writing information. The weather is nice today, perfect for relaxing by reading the latest news.

Recently, a blog with daily updates on divine predictions of epidemic data in the United States and Europe has become popular on the Internet.

How awesome is it? To name just a few, they are as follows:

- In the 10 consecutive days starting from March 27, the blog’s accuracy in predicting the number of infections in the United States was above 90%, with the accuracy on April 4 Close to 100%.

- On March 31, this blog predicted that the U.S. epidemic would fall off a cliff within 8-10 days when the number of people tested exceeded 2 million; 7 days later, on April 6, the U.S. epidemic The data fell off a cliff, with the growth rate dropping from 12.43% to 8.13%. This article caused a huge response, with more than 1.34 million reads.

- Since March 27, the blog’s daily prediction accuracy of the number of infections in Europe has reached an average of 97%. In the first five days of April, the prediction accuracy was close to 100%.

Li Zhibin’s prediction of the number of infections in the United States is 90% accurate

In response, some netizens commented: God, the virus will listen to you, it’s absolutely impossible.

You must know that the outbreak of the new coronavirus pneumonia epidemic is a major global public event involving many complex factors such as politics, economy, geography, etc. The prediction of the specific number of people sounds like a fantasy. Tan, accuracy is a metaphysics. Therefore, to be able to achieve the above prediction results, the blogger behind this blog can be called a modern fortune teller.

So, how was this divine fortune teller made?

Graduate from Tsinghua University + 8 years of experience in market forecasting

The blogger behind this blog is the fortune teller himself, named Li Zhibin.

Li Zhibin studied at the Computer Science Department of Tsinghua University from 1980 to 1985. From 1985 to 1994, he studied and worked at the Chinese Academy of Sciences. At the age of 30, he served as an associate researcher, director of the product department, and assistant to the director. In 1994, he moved to New Zealand, and then settled in Hong Kong. Currently, he is the general manager of Hong Kong Zhijia Logistics Software Co., Ltd. and Hong Kong Yijing Technology Co., Ltd.

Screenshot of Li Zhibin’s blog

Of the two companies where Li Zhibin works, the former’s main business is logistics system development; the latter has a background in the Chinese University of Hong Kong and its main business is market demand Forecasting is to provide companies with data analysis and forecasts on product demand, price fluctuations, etc. in specific regions within the next 3 to 6 months.

Li Zhibin said that he started to enter the field of data analysis and prediction in 2012. Due to the background of Yijing Company at the Chinese University of Hong Kong, Li Zhibin also learned a lot from professors.

In addition, from a technical perspective, Li Zhibin’s study experience in the Computer Science Department of Tsinghua University has also allowed him to form a complete knowledge system in software modeling, big data analysis, etc.; at the same time, Tsinghua University His scientific style and background also make him pay more attention to data, evidence, and examples rather than conclusions.

All these combined make Li Zhibin very sensitive to data.

At the end of last year and the beginning of this year, cases began to be reported in Wuhan, and suspected new coronavirus patients also appeared in Hong Kong. This made Li Zhibin, who has been in Hong Kong for a long time, quite vigilant; on January 7, 2020, the Hong Kong Special Administrative Region Government announced COVID-19 is a notifiable infectious disease, and epidemic data began to be reported to the public. From this, Li Zhibin began tracking data related to COVID-19.

From then on, Li Zhibin got up every morning to collect centralized data. At first, it was only data from Wuhan, Hubei, and Hong Kong, and later data from other mainland regions. In late January, he began to collect overseas data, and Organize it into an Excel table, and at the same time start to use your professional knowledge to conduct data modeling, and combine the data in the news to analyze and judge the official notification data.

At first, Li Zhibin only shared data and opinions with his classmates at Tsinghua University. Later, he also spent 30 minutes a day writing blog posts and published them on Sina Blog. Now, it has become a daily habit.

Of course, for Li Zhibin, in addition to collecting, organizing and analyzing conventional data, he is also constantly combining his professional knowledge to build a data model, and constantly parameterizing this model Supplement and verify to achieve the expected results.

On March 27, Li Zhibin gave the forecast data for the infection situation in the United States for the first time on the basis that the data model had stabilized; on March 28, he gave another forecast for the infection situation in the United States. Forecast data on infections in Europe.

Li Zhibin’s prediction of the number of infections in Europe has an average accuracy of 97%

His predictions include not only the number of infected cases, but also the infection growth rate, peak time, Total number of infections, total number of deaths, death rate and other data. Of course, the number of infections is the most important indicator he uses to measure the accuracy of predictions.

Even Li Zhibin himself did not expect that his prediction data would be so accurate.

However, Li Zhibin emphasized that no one can predict the future with 100% accuracy, and rolling predictions must be made.

He said: Forecasting is a dynamic process, because many immediate measures, events and other unexpected factors are unpredictable. At this time, these emergencies and decisions need to be turned into adjustments to parameters. , which is fed back into the prediction model to make it run more accurately. My prediction model and prediction parameters are also in the process of continuous improvement.

No matter how good the software is, it cannot predict 100% accurately

Li Zhibin’s prediction is inseparable from two core elements: data and prediction model.

The first is the issue of the credibility of the data. In the interview, Li Zhibin said that he started collecting data every day in January. At the beginning, only Wuhan and Hong Kong had data. Until now, he collects data from hundreds of countries and regions every day.

Li Zhibin emphasized that in the process of data collection and analysis, it is necessary to screen for data conflicts; especially when the amount of data officially reported is relatively large, many methods, including news data, will be used to check. There may be data conflicts between data in different regions. The more data conflict points there are, the lower the credibility of the data.

At the same time, in the process of judging the authenticity of the data, it depends on the speed of data release; the higher the frequency of data release, the higher the credibility. South Asia and Southeast Asia release less data. , slower, the credibility will be compromised.

Epidemic situation from the official website of the US CDC

In addition, when judging the credibility of the data, you can also use news data for comparison. Li Zhibin told Lei Feng.com that, for example, if the ratio between doctors and patients is relatively stable, then the number of medical personnel reported in the news can be used to infer the number of patients.

He said that in fact, all data may contain some human errors or statistical errors, and no region is 100% reliable; but relatively speaking, the United States has relatively few data conflicts. The reliability is higher. The reliability of European data is inferior to that of the United States. Because of the imbalance between Western Europe and Eastern Europe, the average value is taken. However, there seem to be some problems with data from India, Southeast Asia, Japan and other regions. Data release is slow and there are many data conflicts, which affects the setting of data credibility.

By the end of February, based on previous modeling and verification based on domestic data, Li Zhibin began to predict epidemic data for the United States and Europe. So, based on the data, Li Zhibin created a prediction model. In fact, this is an extremely complex model with hundreds of parameters in total, of which there are twenty or thirty important parameters, divided into the following three categories:

The first category is the number of confirmed cases, population, number of daily new diagnoses, number of suspected cases, number of daily tests, number of deaths, number of cured patients, number of patients, and hospitalizations in different regions/countries/cities with epidemic parameters. Number of people.

The second type of parameters is related to regional/city/country characteristics such as city type, population density, temperature, weather, proportion of urban population over 60 years old, urban average age, and urban construction conditions.

The third type of parameters is about resources and governance capabilities, medical resources, number of hospital beds, social organization capabilities, information transparency, management methods, etc.

Li Zhibin said that in the actual operation process, he usually uses Excel to collect data, then imports it into the backend database, and then uses the software model he developed to draw three conclusions. Finally, he will manually He emphasized that there are many parameters that cannot be quantified, such as social emotions; therefore, human participation is required.

He also said: No matter how good the software is, it cannot predict 100% accurately.

When big ships and small boats meet icebergs at the same time

Li Zhibin, who graduated from Tsinghua University, has advanced insights and thinking that go beyond data analysis.

For example, in the modeling process, Li Zhibin started with domestic data. These data not only had an important impact on Li Zhibin's modeling process, but also allowed him to draw some observations. So, the day before Wuhan was locked down, he shared two ideas with his classmates in his Tsinghua 80 classmate group:

First, Wuhan should be locked down immediately because the increase in data is too scary;< /p>

The second is to quickly establish 20 or 30 grid-style field hospitals in Hubei, especially Wuhan, as isolation and treatment centers. The so-called field hospitals, later known as Fangcang shelter hospitals, because the epidemic developed too rapidly, patients were isolated It is a more critical prevention and control measure than treatment.

These ideas have caused a lot of discussion among the students. Of course, there are also doubts and objections, but more importantly, the students have actively participated and put forward many better ideas and suggestions, and they have benefited a lot. . Later facts proved that these ideas were pertinent, and were also confirmed by the subsequent measures taken by the officials. Among them, the ideas about field hospitals were two weeks ahead of schedule.

In addition to the above suggestions, Li Zhibin also discovered during the process of data analysis and model construction that cities that become outbreak points often have several characteristics:

Old cities;

The climate is humid;

The temperature is 5-15 degrees;

The sewer system is aging;

There is a high proportion of elderly people.

It is worth mentioning that epidemic outbreak cities in different countries, such as Wuhan in China, Daegu in South Korea, Milan in Italy, Tehran in Iran, and New York in the United States, all roughly meet these characteristics.

As for the attribution of these characteristics, Li Zhibin emphasized that it is mixed with personal subjective and reasonable guesses, but it is also verified by a series of results before it is finally reflected in the prediction results.

He also said that in fact, the parameters also involve social organization methods, management models, social information transparency and other issues, so he will also set the results as pessimistic or optimistic in his predictions.

If the pessimistic prediction results given by Li Zhibin on April 4 are followed, his overall prediction accuracy for the number of infections in the United States is as high as 96%.

Li Zhibin’s prediction of the number of infections in the United States was 96% accurate

However, in the exclusive interview, despite the human participation, Li Zhibin still emphasized the absolute position of data in decision-making . He said that even if the epidemic is put aside, in a daily decision-making process, the importance of data can be said to be 100%; these data must not only be true, but also comprehensive and transparent, even if there are people involved in the follow-up process. Participation also depends on the data judged based on these data, which is the basis for decision-making.

So, how widespread is the coverage of data-based decision-making?

Li Zhibin believes that even mass public incidents such as the COVID-19 epidemic, which are quite accidental and contain complex social factors such as politics and economy, can be predicted.

He said that similar to the situation of infectious diseases, there is a specific pattern in its development. There are regular patterns among accidents. We may not be able to grasp the 100% accurate pattern, but under certain patterns, Under the ratio, we can still make some judgments and decisions. Of course, the premise is a huge amount of effective data.

Thus, Li Zhibin also talked about an interesting metaphor:

When a big ship and a small ship suddenly encounter an iceberg, they are bound to turn; but relatively speaking , the ending of the big ship is obviously more predictable. The small ship corrected itself in an instant, but the large ship was too large and had an inertia, so it was more likely to hit an iceberg. This inertia is the law, and the ship's volume itself is the amount of data.

The larger the amount of data, the more accurate the data, and the more transparent the relevant information, the easier it is to predict when such mass incidents occur, and the more accurate the prediction, Li Zhibin said in the end.

Want to know more "The accuracy rate was once 100%! For more information about "Tsinghua Alumni God Predicts the U.S. Epidemic", please continue to pay attention to the science and technology information column of Shenkong. The editor of Shenkong will continue to update you with more science and technology news.

Source of this article: Deep Space Games Editor: Anonymous King of Hearts 2 Click to try