China Naming Network - Auspicious day query - Definition project of data warehouse 2.0 method

Definition project of data warehouse 2.0 method

One of the responsibilities of the discussion is to create simple but extensible solutions. In order to undertake this responsibility, we must adopt an iterative method to develop the data warehouse. Don't plan a complete solution at the beginning, just plan the next few sprints. This doesn't mean that we don't have an overall idea or the overall goal of data warehouse. This simply means that we will not plan every task to be realized immediately, nor will we model every available resource or information mart needed to deliver the final solution. The development team should first ask customers what they need and what is most valuable to the business. This is the first delivery, because the delivery dependence is satisfied.

Sometimes, this requires building some dependencies first, such as setting up the initial data warehouse infrastructure. However, the final data warehouse infrastructure should be avoided initially. Start small, but make it scalable. Infrastructure with growing demand.

Usually, architects try to create data warehouse solutions layer by layer: they decide on an initial set of source systems to deliver all the required reports or OLAP cubes. Then, they implement the whole staging area to capture all the source tables, including the ETL of the loading table. Once they have completed the staging area, they begin to model the enterprise data warehouse as much as possible, because it is too expensive to contact IT in the future. After the ETL workload is realized, a data mart is created to finally satisfy the business users. This method usually takes several months, if not years, including several stages of debugging and error repair. However, problems arise when requirements change (the architect in this example would say "too frequently") or when the business needs additional functions (in many cases, the original architect has left the organization).

In view of the extensible model and architecture and agile method in the Data Vault2.0 standard, data warehouse architects no longer need to follow this method. The data warehouse is not established horizontally, layer by layer, but vertically according to its characteristics. The overall goal of the data warehouse plan is the same, but now it is realized in the gradual delivery of functions. The goal of the BI team is to deliver the required functions in rapid and frequent releases, as described in the previous chapters of this chapter. In order to do this, the scope of a function must be limited to a single function that is separated from other functions as much as possible.

The suggested method to achieve this goal is to use the method of scope definition in requirements engineering and the realization of individual characteristics, as shown in Figure 3. 1 1.

The feature to be implemented in the example shown in the figure is a report, but it can also be any other artifact that the user needs. For example, it can be a new dimension or attribute in an OLAP cube, a single new OLAP cube (with the smallest dimension) or a corpus for text mining. Once the business determines the scope of the artifact and describes it, it begins to identify the source (the table in the source system) needed to build this single report. Next, determine the goals of the information mart to assess the appropriate location to deliver the required reports (or other functions). Once this identification is performed, engineers can grade the required data, build and load database entities, and build marts. When this process is followed, all the data in the source table is loaded and modeled in Data Vault, not partial attribute sets. So the source table can only be contacted once, not many times. In order to evaluate the availability of data, you should be able to track which data has been loaded into the enterprise data warehouse. The partially loaded data makes the evaluation more complicated, which we want to avoid. In addition, loading only part of the data from the source table will produce more complex Data Vault satellites.

Defining the artifacts to be developed in the iteration (that is, requirements change) is an important prerequisite for the success of the iteration. Proper scoping reduces the risk that the team cannot complete and deploy changes within the sprint time frame. If you are not sure about the required range of change, then a short sprint of two weeks or even a week is impossible. In addition, because of the Data Vault2.0 model, teams can now build solutions step by step across business lines, so they can remain flexible within the scope of implementation.

We should pay attention to two objections: the first objection is to implement all tables from one source to keep the cost of integrating the source system low-in this case, load data that is not needed by the current solution. Loading this data requires additional ETL capacity, which requires a larger initial infrastructure. In addition, the realization of all source tables from a source system may not be completed in a sprint, and the manpower that can be used to realize the characteristics to be delivered to the business is constrained. This effort often exceeds the complexity of evaluating the data that already exists in the data warehouse (it is easy to pay attention to the table). Another problem is that when the source table is implemented in the assembly area, it is also a good practice to integrate data into the Data Vault. Otherwise, when the two systems may be out of synchronization, additional complexity is needed to evaluate the current state of the data warehouse. If you follow this practice, loading all source tables requires complete modeling and loading the corresponding Data Vault tables.

The second objection is that in order to reach the final solution, it is very expensive to contact the target many times. This may be correct, but the ultimate goal is to provide operational and useful functions for the business in a sprint, because it reduces the risk of failure: the business does not accept the solution, for example, because it does not meet the written requirements, during which the requirements have changed, or once the business users actually use the solution, the solution will be proved to be wrong.

This vertical information transfer method is implemented in sprint. Depending on the organizational capacity, it may take two to three weeks. Therefore, the modeling of Data Vault should not take several months. Instead, the model should be created in the sprint phase. If it takes longer, this is a good indicator, and the sprint range is too large. In this case, the feature should be removed from the sprint. Deleting all content that does not need to be delivered with a single feature is the focus of this sprint. Make sure business users understand that this feature has not yet been delivered-but in a future sprint. Usually, business users think that this feature is completely deleted because it was deleted from the sprint in the plan. However, this is wrong, because the missing functionality will be delivered soon after the next iteration or iteration. Business users will naturally accept this program once they see the progress of the project.

You need to define a new function before it can be implemented in sprint. However, the requirements collection process is very similar to the implementation process. Usually, enterprises have a general concept of the functions to be realized in the data warehouse. But it still has many questions to answer, such as data sources, business rules for aggregating or transforming data, data types, use cases and so on. To answer these questions, a set of requirements is used.

In order to support the agile method of requirements collection, requirements are collected along the process. Unlike the traditional data warehouse, these requirements are collected at the beginning of the project. The most effective method in our project is to use the original bazaar and quickly push the data to the requirements meeting for review. These original marts are used to create reports or cubes for a limited number of business users attending demand meetings, but not for distribution. This is because the original data contained in the original mart may or may not fully implement the business rules. Show these reports to users by showing them and ask them, "What's wrong with this report?" Facts have proved that business users can easily point out the problems in the report, and by doing so, provide all the business rules needed to realize the final report.

The steps of this requirement collection method are as follows:

So far, it has controlled the delivery time They have a responsibility to be so agile. The next step is driven by the business aspects of the project:

Once the requirements are collected, at least part of them, it drives the project again by executing business rules and other requirements:

After these business rules are implemented by IT, the business parties of the project can review and test the output, and if they are not satisfied with the final result, they will ask for further modification. However, these changes became requirements changes and were implemented in the subsequent sprint. The described agile requirements collection process helps business users express their business rules. For many of them, the traditional focus on requirements documents is too abstract and prevents the required identification to identify the problems in the draft report.

The suggested method is to record these requirements meetings and set up a Wiki site for everyone in the organization. Meeting minutes, including descriptions of discovered business rules, should be registered on the website to ensure the transparency of the requirements collection process. Web2.0 mechanism enables participants to comment according to their own understanding and even modify business rules. This method first ensures that the requirements are correct. If there is a lot of discussion on the website, it may be necessary to hold another requirements meeting to clarify any unresolved issues before implementation can begin. Holding these discussions before the actual implementation means great benefits to the organization and the improvement of productivity, which is the contributing factor to the overall success of the project. In order to make the correct assumption that the team's functions can be completed in a sprint, it is very important to define the scope, and the team must be able to make a correct estimate of the efforts required to complete some functions. This topic will be discussed in the next article.

上篇: Slogan of National Vaccination Publicity Day 下篇: Why do spring and autumn in the north feel so short?