The storage requirements of big data and high-performance environments
The demand for storage in big data and high-performance environments
For a long time, the main purpose of high-performance computing has been to increase computing speed to solve large-scale scientific computing and massive data processing problems . The powerful computing power of high-performance computing at the level of one trillion times per second makes it an important technology choice in fields such as petroleum, biological exploration, weather prediction, and life science research. However, as the amount of data and the value of data continue to grow, the demand for high-performance computing in finance, telecommunications, Internet and other fields continues to increase. With the development of technology, the processing capabilities of high-performance computing systems are getting stronger and stronger, the calculation time of tasks is getting shorter and shorter, and the value to the business continues to increase. However, to achieve fast task computing processing, the storage capacity of high-performance computing systems is key. Because at the beginning of the calculation, the data needs to be read from the storage system; at the end of the calculation, the calculated results need to be written to the storage system. If the read and write speeds do not match, it will not only delay the completion cycle of high-performance projects, but low latency will also seriously affect the ability of high-performance to create value. Generally, high-performance computing requires storage systems that can meet performance and scalability requirements to protect return on investment: throughput reaches several or even dozens of GB/s, and capacity can be expanded to PB levels; transparent access and data sharing ; Centralized intelligent management, high cost performance; capacity and performance can be independently expanded as needed. Zhongqiao analysts field-tested the operation of EMC Isilon products in its HPC environment at the Shenzhen BGI Research Institute and recorded the results.
Background
High Performance Computing (HPC) refers to the use of many processors (as part of a single machine) or several computers in a cluster organization (as a single Computing systems and environments for computing resource operations). For a long time, the main areas of high-performance computing applications have been scientific and engineering computing, such as high-energy physics, nuclear explosion simulation, weather forecasting, oil exploration, earthquake prediction, earth simulation, drug development, simulation and modeling in CAD design, and fluid mechanics. calculations, etc. Today, the demand for HPC in fields such as financial securities, government informatization, telecommunications industry, education, enterprises, and online games is also growing rapidly.
Applications of high-performance computing
High-performance computing has a wide range of industry application foundations. Here are the application requirements for high-performance computing in several industries:
1 . Aerospace Industry
In the aerospace industry, with the rapid development of China's aerospace industry, especially the great success of manned space technology, our country's scientific and technological personnel have put forward more advanced methods for numerical simulation research on aerodynamics. With the increasing demands, conventional computing power is far from being able to meet the huge demands brought by complex large-scale aircraft design. In the design process of aerospace companies, researchers often need to divide the aircraft surface into millions or even tens of millions of discrete grid points, and then solve equations through high-performance computing platforms to obtain the temperature of each grid point. , speed, friction and other parameters, and simulate continuous curves, thereby providing valuable reference materials for aircraft design. For this type of calculation, the finer the grid points are divided, the better the accuracy of the calculation results. However, these large-scale design calculation problems not only require huge calculations for a single job, but also require continuous adjustment and repeated calculations. Therefore, high performance occupies a pivotal position in the aerospace industry.
2. Energy industry
As a national strategic resource, petroleum energy has very important strategic significance for the national economy, security, military and other aspects. Petroleum exploration undertakes the important task of finding oil storage structures and determining well locations. The current mainstream approach is to artificially create earthquakes of corresponding scale (depending on the area and depth of the exploration area), and at the same time spread a number of seismic wave collection points in the corresponding strata. Since the influence of the geological environment of different materials on seismic waves is regulated, with the help of this, through relevant algorithms, the geological structure can be "calculated" through the transmission calculation of seismic waves, so as to find out what we need Energy location. This amount of calculation is undoubtedly extremely large. Since the data collected by seismic wave method exploration is usually measured in TB, in recent years, the data collected in offshore oil and gas exploration has even begun to develop to the PB scale. For this reason, only with the help of high-performance computing can these massive data be processed in the shortest possible time.
3. Life Sciences
In the field of modern life sciences, data-driven changes are causing huge changes. The analysis of massive biological data will enhance the ability to monitor diseases in real time and respond to potential epidemics, but the mining, processing, and storage of massive data face unprecedented challenges. Especially with the rapid development of next-generation sequencing technology, the massive data generated by genomics research is growing at a rate of 10 times every 12-18 months, far exceeding the famous Moore's Law. This has caused many biological companies and scientific research institutions to Facing powerful data analysis and storage requirements.
In China, the development momentum of the biogenetic industry cannot be underestimated.
On January 30, 2011, the National Development and Reform Commission approved the establishment of a national gene bank in Shenzhen relying on the BGI Institute. This is the first time that a national gene bank has been established in China, with an initial investment of 15 million yuan. The Shenzhen National Gene Bank is a national-level public welfare innovative scientific research and industrial infrastructure construction project that serves national strategic needs. It is currently the only national-level gene bank approved to be established in my country, and is the only three national-level gene banks in the world after the United States, Japan and Europe. The fourth national gene bank in the world after the national gene bank. Currently, the national gene bank has collected 1 million GB of biological data, including genome, transcriptome, proteome, metabolome and phenotype data, and has also accumulated approximately 400,000 biological samples. The gene bank is expected to eventually reach a data capacity of 1 billion GB. Compared with existing gene banks in the world, the Shenzhen National Gene Bank is characterized by having both a "wet bank" and a "dry bank": the former collects resources and samples from tens of millions of entities, such as animals, plants, microorganisms, and human tissue cells. Integrated into the network; the latter collects huge amounts of nucleic acid, gene expression, protein, phenotype and other types of data information, becoming a powerful tool for studying biological growth and development, disease, aging, death and promoting industrialization in the era of "big data" biology.
4. Financial industry
In the final analysis, finance is data. In financial markets, speed means greater productivity and more market share. Financial calculation models are quite complex, and the more data collected, the more accurate the calculation results will be. Financial analysts urgently need a financial calculation program that can simulate complex real-life environments and perform accurate processing so that they can timely evaluate investment returns and measure investment risks for each investment product in order to obtain better investment returns. For this reason, high-performance computing has been increasingly applied to the global capital market in order to achieve dynamic response and transformation to the market in the shortest possible time.
5. Weather Forecast
In the early 1920s, the weather forecast equations had been basically established. But numerical weather prediction became possible only with the advent of computers. Before the use of parallel computer systems, due to limitations in processing power, only 24-hour weather forecasts could be achieved. High-performance computing is a necessary means to solve large-scale scientific calculations in numerical prediction. The use of high-performance computing technology can improve forecast accuracy by increasing resolution.
6. Game animation and film and television industry
With the rise of 3D and 4D movies and the popularity of high-definition animation, the "rendering farm" composed of high-performance computing (HPC) clusters has It has become an indispensable production tool for 3D animation and film and television special effects companies. Animation rendering is calculated based on a complete set of procedures, thereby converting animation design into specific images through the combination of model, light, material, shadow and other elements. Take "Toy Story" as an example. If only a single workstation (single processor) is used for animation rendering, the rendering time of this 77-minute film will be 43 years. However, using a cluster rendering system, it will only take about 80 days. .