Contributed April 12, 2001 by Volkan PEKINCE The primary components of a data warehouse are: * Operational/Legacy Systems - The sources of data for the warehouse. * Data Extractors - The programs that offload data from operational systems * to the warehouse. * Meta Data - Data about the warehouse. * Data Warehouse - The actual data repository. * Decision Support System Engine - The software used to create data structures. * Decision Support System Application - The software used to access the Warehouse. * Personal Productivity Applications - Software tools for final report formats. * Network - Connections among components. OPERATIONAL / SYSTEMS LEGACY The application systems that process the ma ority of the organization's transactions are known as operational or legacy systems. These systems are the primary source for extracting information for the Data Warehouse. BUSINESS RISKS Operational systems can be a "black hole" when it comes to describing or assessing business risk. The vast number of applications currently in use and the infinite variety of administration methods make their risks virtually impossible to describe at a generic level. However, we have identified several areas of risk that would affect most organizations. The risk assessment for your organization must be tailored to your environment. Technological Obsolescence - Many organizations' operational systems are generations old, and have been modified many times over to support changing requirements. These systems are risky by their very nature. Inflexibility, volatility and excessive maintenance requirements are common features found in many legacy operational systems. As the systems become older and more outdated, they become riskier in terms of support and reliability. Inflexibility - Lack of Capacity to Adjust to Dynamic Business Environments - The operational systems must be adaptable to changing economic and business environments. Companies are becoming increasingly market-driven and demand delivery of products and services in ever-decreasing lead times. Data Integrity - Operational Data - The varied and complex nature of applications in many large organizations indicates that data integrity is a significant concern. If an organization's mission-critical systems reside on multiple platforms in multiple locations, if they are technologically obsolete, or if an organization relies on multiple applications for similar processes (i.e., multiple loan applications for a bank due to mergers and acquisitions), then the data may not be compatible across application boundaries. The bottom line is if data integrity or compatibility issues exist in the operational systems, then they necessarily will exist in the warehouse, unless data conditioning is performed in the Data Extractor component. Control Environment - The control environment surrounding the operational systems has implications as the data is ported to the warehouse. If, for example, the access controls are weak in the operational systems, then unauthorized entries into these systems may be reflected in the warehouse as well. Control improvements can be made and conditioning of data can be done in the Extract process, but should not be considered a substitute for sound controls in the operational systems. CONTROL ACTIONS * - Procedures or mechanisms should be established to notify all responsible Data Warehouse managers when operational system changes occur. * - The selection of data to be extracted from operational systems should be coordinated and periodically reviewed with system users to obtain the most complete and accurate list. * - Data from disparate systems must be reconciled and conditioned to ensure referential integrity, or to translate the different meanings of data across operational systems to a common definition in the warehouse. * - Warehouse must be reconciled to source systems during the development process. * - Data reengineering may be required as a means to automate the investigation, standardization and integration of actual data values from multiple legacy systems. * - Significant control weaknesses in any of the above-listed areas must be identified and resolved prior to implementation of the warehouse. DATA EXTRACTOR The links between the operational systems and the relational Data Warehouse are called Data Extractors. They are the control points that contain the means and methodologies for extracting the required underlying data from the operational systems and transforming the data into cross-functional, integrated integrate corporate information on the warehouse. Transformation programs can be written to extract, filter, , condense and/or restructure data. BUSINESS RISKS Risks for the Data Extractor Component depend directly on the usage of the extracted data. If, for example, the data are used to plot the direction of the business through strategic decisions, then the quality of the data is of the utmost importance. For the purposes of this discussion, we assume that the data will be used to support enterprise strategic decisions. As always, the organization must assess the risk based on its unique characteristics and risk factors. Loss of Market Share - If the organization makes strategic decisions based on the information gained from the Data Warehouse, then the completeness, accuracy and timeliness of this information is critical. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usual" dictates this risk. As dependence on warehouse information and processes increases, the reliability of the extractors must increase proportionately. Efficiency and Effectiveness - Businesses use Data Warehouse technology to improve their ability to access data in a timely and efficient manner. Extractors are a key process in providing quality data in a timely manner. Competitive Disadvantage - The advantages that the organization gains from the warehouse can be negated if the extractors do not provide complete and accurate data in a timely manner. CONTROL ACTIONS * As operational systems and the Data Warehouse change, the extractors must change accordingly. A process must be defined to enable proper notification of all system managers in the event of change. * Transformation programs or algorithms must follow the same change management process as other production programs. * The extracted data must be compared back to the operational systems to ensure completeness and accuracy. * In some cases, data must be validated by field, with validation based on data types or specific values. * The extractors must be run on schedule, with controls to ensure that prerequisite work has been completed and that correct versions of files are being input. * If the extraction process encounters problems and the decision is made to allow "dirty" data to be populated to the warehouse, a mechanism must be designed to notify users of the problem and the effect it will have on their data. * Access to extraction programs must be limited to authorized personnel. * Change's to extraction programs must be performed by authorized personnel, and be consistent with user and database requirements. * Extraction processes must be recoverable in the event of a process interruption. REPOSITORY COMPONENT The Repository Component is where the data is actually stored. The Repository Component does not support day-to-day operations. Instead, its purpose is to be a mechanism for information retrieval and analysis. BUSINESS RISKS The repository for the Data Warehouse typically is a relational database management system (RDBMS) that may support distributed databases on mainframe, midrange, and Local Area Network platforms. Normally, data elements can be accessed by many users in this environment. This necessitates access controls to specific data elements. In the warehouse, the application programs no longer access data or perform data management functions directly; rather, the warehouse RDBMS acts as the interface to the DSS engine and DSS application. This access mechanism is the technique that enables control of data independence, sharing, and manipulation. Technological Obsolescence - Distributed environments are risky by their very nature. Multiple platforms make it more difficult to stay "'current" with technology. As a result, inflexibility and volatility can become commonplace. Inflexibility - Lack Of Capacity to Adjust to Dynamic Business Environments - The repositories must be adaptable to changing economic and business environments. Companies are becoming increasingly market-driven and demand delivery of products and services in ever-decreasing lead times, which can drive dynamic change in the warehouse environment. Efficiency and Effectiveness - User knowledge of the warehouse data structures and the tools provided to access the data is a critical piece of the warehouse puzzle. Appropriate training, on an ongoing basis, is required to ensure that quality systems or queries are being used as a basis for enterprise decisions. Data Integrity - Repository Data - The varied and complex nature of applications in many large organizations indicates that data integrity is a significant concern. If data integrity or compatibility issues exist in the operational systems, then they necessarily will exist in the warehouse, unless data conditioning is performed in the Data Extractor component. Loss of Market Share - If the organization makes strategic decisions based on the information gained from the Data Warehouse, then the completeness, accuracy and timeliness of this information is critical. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usual" dictates this risk. As dependence on warehouse information and processes increases, the reliability of the RDBMS must increase proportionately. Competitive Disadvantage - The advantages that the organization gains from the warehouse can be negated if the RDBMS does not ensure availability of complete and accurate data in a timely manner. CONTROL ACTIONS • Procedures or mechanisms should be established to notify all responsible Data Warehouse managers when operational system changes occur. • User access capabilities must be established. • Controls must be in place to ensure that the data in the warehouse remains current and accurate over time. Periodic refresh of warehouse data is an effective method of ensuring this continuity. • RDBMS programs and utilities must be protected from unauthorized modification or execution. • Backup and recovery facilities must be in place and tested to ensure availability of data. • Users must be trained in the warehouse data structures. This training process must be ongoing to reflect the dynamic warehouse environment and the nature of the surrounding tools. META DATA Meta Data is "data about data' " This component provides standardized data definitions and indicates the nature of data transformations from operational systems. Transformation rules, including summarization, consolidation and derivation, support business rules. Mapping provides information on the source of data. Meta Data provides users with information on how to find data, what the data means, who is responsible for the data, how the data was derived, and if the data is up-to-date (i.e., time stamped data). This component can also include definition of access privileges. Meta Data contains a variety of data types: • Structure of data • Keys and attributes of data • Source before entering the warehouse • Mapping of data from the source to the warehouse • Extract history of data as it enters the warehouse • Alias information about data in the warehouse • Logic and algorithms used to relate the levels of summarization within the warehouse • History of structural changes of data stored in the warehouse BUSINESS RISKS Meta Data is a critical piece of the Data Warehouse. As the complexity of the source (operational) systems increases, the importance of quality and integrity of the Meta Data increases proportionately. A basic reason for implementation of the Data Warehouse is to simplify access to data from various operational systems. The Meta Data is the key to this simplification. All control issues, confidentiality, availability and integrity apply directly to Meta Data as well as the data that it describes. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usual" dictates this risk. As dependence on warehouse information and processes increases, the integrity of the Meta Data must increase proportionately. fficiency and Effectiveness - Businesses implement Data Warehouse technology to improve their ability to access data in a timely and efficient manner. Meta Data is a key component in providing quality data in a usable and understandable format. Competitive Disadvantage - If Meta Data are not protected from unauthorized access, competitors could gain access to the "road map" of data stored in the warehouse, thus enabling them an understanding of potentially sensitive corporate information. CONTROL ACTIONS • Procedures or mechanisms should be established to notify those responsible for maintenance of Meta Data when operational systems or the Data Warehouse change. • Programs associated with Meta Data must be subject to the same change management process as other production programs. • Enterprise data naming and definition standards must be established and closely followed. • Enterprise policies for data usage and coded key values must be established and closely followed. • Users must be educated in the access and usage of Meta Data. • Access to Meta Data for warehouse users must be defined as "read only." Update access must be tightly restricted to protect the integrity of the Meta Data. • Viable backup and recovery procedures must be developed and tested for the Meta Data. • Procedures must be established for the ongoing maintenance and change control of Meta Data. • For all data elements, a CRUD (Create, Read, Update, Delete) analysis should be performed to ensure that user access capabilities are consistent with the intentions of the organization. DECISION SUPPORT SYSTEM ENGINE The Decision Support System (DSS) Engine is the transmission between the Warehouse and the Decision Support Application. It provides automatic query creation, multi-dimensional data analysis capabilities, mathematical manipulation and cross-tabulation. Based on user-desired metrics and attributes, it will locate the table with the "best fit" within the enterprise data warehouse. BUSINESS RISKS As the main program that drives user inquiries of the Data Warehouse, the DSS Engine is a critical component primarily from the perspective of availability. If the DSS Engine is not available for inquiries, or if it is inefficiently used, performance of other warehouse components can be adversely affected. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usual" dictates this risk. As dependence on warehouse information and processes increases, the reliability and availability of the DSS Engine must increase proportionately. Efficiency and Effectiveness - Businesses implement Data Warehouse technology to improve their ability to access data in a timely and efficient manner. Use of the DSS Engine must be controlled to maximize its availability and minimize the potential of inefficient queries. Users must be knowledgeable in the use of the DSS Engine and cognizant of the effect on other processes if large, cumbersome queries are initiated. CONTROL ACTIONS • Use of a query governor should be considered to control runaway user queries. • Application and system programs associated with the DSS Engine must be subject to the same change management process as other production programs. • Users must be educated in the efficient usage of the DSS Engine. • Viable backup and recovery procedures must be developed and tested for the DSS Engine to ensure its timely availability in the event of an interruption or disaster. DECISION SUPPORT SYSTEM APPLICATIONS The Decision Support Application is the non technical dialog used by the knowledge worker in selecting the data to be retrieved, including the criteria used in selecting the data and the design in which the data will be retrieved. It provides point-and-click construction of complex Strucutred Query Language (SQL) queries. BUSINESS RISKS When developed using a powerful DSS engine, the DSS application construction should consist only of custom data view creation and formatting. The application should provide facilities for metric selection, criteria selection, and data rotation specification. As metrics, dimensions, and attributes are added to or deleted from the warehouse repository, there is the risk that applications will become obsolete. Loss of Market Share - If the organization makes strategic decisions based on the information gained from the Data Warehouse, then the completeness, accuracy and timeliness of this information is critical. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usuar' dictates this risk. As dependence on warehouse information and processes increases, the reliability of the DSS application must increase proportionately. Efficiency and Effectiveness - User knowledge of the warehouse data structures along with the tools provided to access the data is a critical piece of the warehouse puzzle. Appropriate training, on an ongoing basis, is required to ensure that quality systems or queries are being used as a basis for enterprise decisions. Competitive Disadvantage - The advantages that the organization gains from the warehouse can be negated if the application does not provide complete and accurate data in a timely manner. CONTROL ACTIONS • Users must be trained in use of the DSS Application and related tools and data structures. This training process must be ongoing to reflect the dynamic warehouse environment and the nature of the surrounding tools. • As DSS engines and personal productivity tools change, the applications must change accordingly. A process must be defined to enable proper notification of all appropriate personnel in the event of change. • DSS Application programs must follow the same change management process as other production programs. • Controls should be in place to ensure that correct versions of warehouse files are used. • Access to application programs must be limited to authorized personnel. • Changes to application programs must be performed by authorized personnel, and be consistent with user and database requirements. PERSONAL PRODUCTIVITY APPLICATIONS Personal Productivity Applications are the tools, such as word processing, spreadsheet, personal database and electronic mail, which provide knowledge workers with access to the Data Warehouse through "friendly" graphical user interfaces. BUSINESS RISKS In the data warehouse environment, the DSS infrastructure can be tightly integrated with personal productivity applications such as word processing, spreadsheet, personal databases, and electronic mail. Integrated data routing can allow query results to be routed directly to these applications. Business risks are manifested in organizational experience, and its failure to keep pace with new technologies and changing environments. Inability to Perform Business Functions - The organization's dependence on the Data Warehouse to conduct "business as usual" dictates this risk. As dependence on warehouse information and processes increases, the requisite knowledge and reliability of end-user tools must increase proportionately. Efficiency and Effectiveness - User knowledge of the warehouse data structures and the tools provided to access the data is a critical piece of the warehouse puzzle. Appropriate training, on an ongoing basis, is required to ensure that quality systems or queries are being used as a basis for enterprise decisions. Data Integrity - Data integrity and correctness of processing results are major concerns. Poor controls can result in the use of improper assumptions or models, inadvertent reliance on inaccurate or incorrect data, inadequate exception and error handling, and inconsistent data. CONTROL ACTIONS • Users must be trained in use of the Personal Productivity Applications, related tools and data structures. Thi: training process must be ongoing to reflect the dynamic warehouse environment and the nature of th( surrounding tools. • As end-user requirements change, personal productivity tools must change accordingly. A process must bi defined to enable proper notification of all users in the event of change or upgrade of tools. Change must bg accompanied by appropriate training. • Access to end-user applications and data must be limited to authorized personnel. • Completeness and accuracy of data used in Personal Productivity Applications must be ensured by comparison to upstream sources. • Procedures must be in place to ensure recoverability of data. NETWORK The Network, as defined in this model, is the communications infrastructure that enables each component t( provide information to or receive information from other components. Because the Data Warehouse spread information resources across more system and application platforms than traditional data processing architectures, more network is implied. BUSINESS RISKS Since the network is the functional component of the warehouse architecture that enables the other component to communicate among themselves, the controls implemented here directly affect all facets of the warehouse Because the information exported to the warehouse typically is a conglomeration of data from across man, applications, its appeal to unauthorized users may be increased. For example, a list of customer numbers alon is of limited value if accidentally or intentionally disclosed. However, a list containing customer numbers, names, addresses, contacts, and sales history may be of great value to a competitor. The Data Warehouse, whil improving efficiency within the organization, can also potentially increase the efficiency of those who wout attempt to steal critical information. The network is the first line of defense against this type of intrusion. Customer PerceptionlPablic Image - Customers are not likely to react mildly if their confidential information is inadvertently released for publication. Adverse Regulatory Action - Regulators are likely to closely watch organizations who experience significar losses through control weaknesses. They may impose burdensome controls or financial penalties as a result c such losses. Efficiency and Effectiveness - Network design plays a key role in the efficient operation of the Data Warehouse. Response times and query turnaround time will depend partly on the capacity of the network to process large amounts of data. Also, backup and redundancy features of the network will ensure its availability in the face of line outages or physical disasters. CONTROL ACTIONS • The network should be designed with security in mind. Proper network administration, logical access controls, and physical protection of network devices should be implemented. Consideration must be given to the potential for various communication platforms in network components and the security implications therein. • Network software should be administered by the appropriate information systems staff, and controlled in a similar fashion to other operating system software. • Network security must be set up such that only those authorized to perform security activities can access the security tables. • The network must be designed with adequate redundancy in all components to reduce the risk of communication outages. • The network must be accompanied by a contingency plan which defines the procedures to be followed in case of a disaster or network outage. • Users and system administrators must be informed of their network-related responsibilities and procedures must be set up to ensure that these responsibilities are carried out.