Introduction

This graduation project continues work on the development of the automated information retrieval system "International Scientific and Technical Cooperation of Russian Universities", developed as part of research work with the Ministry of Education and Science of the Russian Federation (Ministry of Education and Science). At this stage, one of the main tasks is to search for new solutions that allow for rapid analytical processing of information, taking into account large amounts of data, the complexity of relationships between them, and the limited time of the user.

The diploma project is devoted to the development of multidimensional data models for building OLAP cubes, the creation of a software mechanism for accessing this data, the development of a web-based user interface that allows you to create output data structures depending on the needs of the operator, and visualize the results obtained.

The developed application is a relevant tool for ensuring the organizational activities of the relevant departments of the Federal Agency for Education, both in daily practical work(prompt preparation of current references and working materials on various problems of international scientific and technical cooperation, statistical data for meetings, sessions, etc.), and when summing up (monthly, quarterly, annual and other reports). The functionality of the application can be useful in the field of statistics and analytics.

The decision to use the web application format allows you to access the database from anywhere in the world, there is no need to install an additional client software.

Review and analysis software technologies development of WEB-applications for analytical data processing

Technologies of operational analytical data processing

programmatic model application data

Currently, huge amounts of data are accumulated in accounting, so-called transactional (OLTP) systems.

Such systems are built on the basis of modern DBMS, in which the transaction management mechanism is developed, which has made them the main means of creating online transaction processing systems (OLTP systems, On-Line Transactions Processing).

The main task of such systems is to ensure the execution of operations with the database. In such systems, search functions are almost always provided, including those that allow you to display some summary and aggregated information.

But the ability of such systems to perform complex, in-depth data analysis, allowing you to make informed decisions, is limited.

Without productive processing and analysis, colossal flows of information ore, i.e. raw data, form a dump that no one needs.

In this regard, there was a need to create analytical systems that would allow turning raw data into useful information and knowledge on the basis of which management decisions can be made.

Data analysis to some extent is carried out in many information systems, including OLTP systems. But the types of data analysis differ depending on the flexibility and depth of the analysis carried out.

Information retrieval analysis - data analysis carried out according to predetermined, i.e. predefined types of requests (regulated requests).

Operational-analytical analysis is data analysis that requires the formation of ad hoc queries, when it is impossible to predict in advance which queries the user will need.

Intellectual analysis is a deep data analysis that allows you to obtain knowledge hidden for the user from the available data, such as:

§ functional and logical patterns in the accumulated data;

§ models and rules explaining the patterns found;

§ forecasts for the development of processes.

A comparison of the characteristics of different types of data analysis is illustrated in Table 1.1.

Specifications	Types of data analysis
Information retrieval analysis	Operational-analytical analysis	Intelligent analysis
Request types	Regulated	Unregulated	Deep Scan
Type of received data	Raw Data Samples	Generalized, grouped, aggregated information	Models, patterns, patterns, knowledge
Tasks to be solved	Getting Data Samples	Rough exploratory analysis, testing pre-formulated hypotheses	Obtaining new, non-trivial, hidden knowledge
Level of interactivity			Interactive interaction with information

Table 1.1 - Comparison of types of data analysis

In accordance with the types of data analysis discussed above, analytical systems can be divided into the following groups:

1. Corporate reporting systems:

§ used to control the operational situation and analyze deviations (answer the question “what is happening”);

§ provide operational data on the results of activities in the form of predefined reporting forms;

§ based on information retrieval analysis of data;

§ may not use a data warehouse, but take data directly from OLTP systems;

§ designed for a wide range of end users (customers, partners, fiscal institutions).

2. Systems for analytical data processing and analytical reporting (OLAP systems - systems for online analytical processing, On-Line Analytical Processing):

§ allow you to perform multivariate data analysis for various slices;

§ have advanced tools for analytical reporting and data visualization in the form various types tables, graphs and diagrams;

§ based on operational-analytical data analysis;

§ Most often they use a data warehouse optimized for the tasks of multivariate data analysis;

§ focused on users who need constant interactive interaction with information (managers, analysts).

3. Systems of deep data analysis:

§ have advanced tools for in-depth analysis;

§ allow you to get non-trivial, hidden knowledge;

§ use the data warehouse as a source of information;

§ based on data mining;

§ are intended for analysts with knowledge in the field of data analysis methods;

§ Allow you to create complete applications for end users in the form of built models, templates and reports.

A schematic description of the division of analytical systems into the above groups is shown in Figure 1.1.1.

OLAP (On-Line Analytical Processing) is a technology for online analytical data processing that uses methods and tools for collecting, storing and analyzing multidimensional data in order to support analytical activities and the possibility of generating ad hoc queries and reports based on them.

Figure 1.1.1 - Types of analytical systems

OLAP systems are built for end users and analysts, providing them with the tools to analyze data and test emerging hypotheses.

Known test, created in 1995, which determines the criteria by which the system can be attributed to the class of OLAP systems.

This test is called FASMI (Fast Analysis of Shared Multidimensional Information) and is currently widely used.

According to the FASMI test, OLAP is defined by five keywords:

§ Fast (Fast);

§ Analysis (Analysis);

§ Shared (Shared);

§ Multidimensional (Multidimensional);

§ Information (Information).

A schematic representation of the test is shown in Figure 1.1.2.

Figure 1.1.2 - FASMI test.

1. Fast (Quick)

The OLAP system should be able to respond to most queries within approximately 5 seconds. For simple requests, this indicator can be 1 second, and for requests of rare complexity, it can reach 20 seconds.

Research shows that if a response is not received within 30 seconds, then the user ceases to consider the system useful. He is able to press a combination of keys ++~~, if the system does not warn that data processing requires more time.~~

But even if the system warns the user about a long processing time for an analytical request, the user can get distracted and lose his mind, which will negatively affect the quality of the analysis.

Such processing speed is not easy to achieve on huge data arrays, especially if non-standard and complex queries are required that are formed on the fly.

To achieve this goal, OLAP system developers use different methods:

Dynamic data preprocessing;

Creation of special software and hardware solutions;

The use of hardware platforms with greater performance.

The speed criterion is the most critical in determining whether a system belongs to the OLAP class.

2. Analysis (Analysis).

An OLAP system must be able to handle any logical and statistical analysis that is specific to a given application area.

All required analysis functionality must be provided in a way that is understandable to the user.

An OLAP system should be flexible in displaying graphical analysis results and be able to generate reports in any desired way without the need for programming.

3. Shared (Shared).

An OLAP system must work in a multi-user mode, which raises the issue of ensuring the confidentiality of information and the availability of information protection tools in such systems (access rights, access authorization, etc.).

4. Multidimensional (Multidimensional).

An OLAP system must provide a multidimensional view of data. It is not about the number of dimensions of a multidimensional data model or the size of each dimension. It depends on the specific application area and the analytical tasks being solved.

5. Information (Information).

The OLAP system must provide necessary information in a real application.

The power of an OLAP system is determined by the amount of input it can process. The information processing capabilities of OLAP systems vary by a factor of 1000, which is determined by many factors, including the required RAM, disk space usage, integration with data warehouses and other analytical components.

Thus, in the FASMI test, emphasis is placed on such important properties of OLAP systems as processing speed, multi-user access, relevance of information, the availability of statistical analysis tools and multidimensionality, i.e. representation of the analyzed facts as functions of a large number of their characterizing parameters.

Analytical data processing - this is data analysis that requires appropriate methodological support and a certain level of specialist training.

Modern Information Technology make it possible to automate the processes of analyzing the accumulated primary information, build analytical models, receive turnkey solutions and put them into practice. Basic requirements which apply to methods of analysis are efficiency, simplicity, automatism. This concept underlies two modern technologies: Data Mining and Knowledge Discovery in Databases (KDD).

Data Mining - this is the process of discovering in raw data previously unknown, non-trivial, practically useful and accessible for interpretation of knowledge necessary for decision-making in various areas of human activity (definition by G. Pyatetsky-Shapiro, one of the founders of this direction).

Data Mining technology is aimed at finding non-obvious patterns. The stages of data analysis are:

1) classification ( classification) - detection of features that characterize groups of objects of the data set under study - classes. Solution methods used for the classification problem: nearest neighbor methods ( nearest neighbor) and ^'-nearest neighbor ( k-nearest neighbor)-, Bayesian networks (Bayesian networks)-, induction of decision trees; neural networks (neural networks)-,
2) clustering (clustering)- splitting objects into groups, since object classes are not initially defined. An example of a method for solving a clustering problem: self-organizing Kohonen maps - a neural network with unsupervised learning. An important feature of these maps is their ability to map multidimensional feature spaces onto a plane, presenting the data as a two-dimensional map;
3) association (associations)- identifying patterns between related events in a data set. These patterns are revealed not on the basis of the properties of the analyzed object, but between several events that occur simultaneously, for example, the Apriori algorithm;
4) sequence (sequence), or serial association (sequential association),- search for temporal patterns between transactions, i.e. patterns are established not between simultaneously occurring events, but between events connected in time. An association is a sequence with a time lag equal to zero. Sequence rule: after the event X across certain time event Y will occur;
5) forecasting (forecasting) - is based on the features of historical data, i.e. the omitted or future values of the target numerical indicators are evaluated. To solve forecasting problems, methods of mathematical statistics, neural networks, etc. are used;
6) definition of deviations or outliers (deviation detection), analysis of deviations or outliers - detection and analysis of data that differs most from the general set of data;
7) assessment (estimation)- prediction of continuous feature values;
8) link analysis (link analysis)- the task of finding dependencies in a data set;
9) visualization (visualization, graph mining)- creation of a graphic image of the analyzed data. Graphical methods are used to show the presence of patterns in the data, for example, the presentation of data in 2D and 3D dimensions;
10) summing up ( summaryization) - description of specific groups of objects from the analyzed data set.

KDD is the process of extracting useful knowledge from a collection of data. This technology includes the following issues: data preparation, selection of informative features, data cleaning, application of Data Mining (DM) methods, data post-processing and interpretation of the results.

The Knowledge Discovery in Databases process consists of the following steps:

1) problem statement - analysis of user tasks and features of the application area, selection of a set of input and output parameters;
2) preparation of the initial data set - the creation of a data warehouse and the organization of a scheme for collecting and updating data;
3) data preprocessing - based on the use of Data Mining methods, from the point of view of this method, the data must be of high quality and correct;
4) transformation, normalization of data - bringing information to a form suitable for subsequent analysis;
5) Data Mining - automatic analysis data based on the use of various algorithms for finding knowledge (neural networks, decision trees, clustering algorithms, establishing associations, etc.);
6) data post-processing - interpretation of the results and application of the acquired knowledge in business applications.

There are two classes of systems that provide analytical data processing. Historically, systems that implement statistical analysis were the first to emerge. The results of the operation of these systems are constructed graphs, charts, regulated reports that have a fixed form and lack flexibility. In such reports, you cannot change the presentation of data: change columns with rows, enlarge, drill down, etc. As experience shows, when viewing such reports, managers do not have decisions, but questions that entail the construction of additional reports, which requires time and resources and reduces the efficiency of the decision being made. The need for a fairly quick response to ad hoc requests that arise in the course of data analysis has led to the emergence of systems for online analytical data processing.

OLAP is a class of software that provides the user with the ability to receive real-time answers to arbitrary analytical queries.

OLAP provides analysts with flexible mechanisms for manipulating data and visual display, with the help of which he has the opportunity to compare various business indicators with each other, to reveal hidden relationships. In fact, from the point of view of LIR, OLAP is a convenient graphical shell for navigation, visualization and analysis in various sections of a huge amount of interrelated information about the organization's activities coming from the organization's IS.

OLAP is based on the concept of a multidimensional data cube, in the cells of which analyzed (numerical) data is stored, for example, sales volumes in pieces or in monetary terms, stock balances, costs, etc. These numbers are called measures or facts(measures, facts). The axes of the multidimensional coordinate system are the main attributes of the analyzed business process, which are called measurements(dimensions). Examples of measurements can be product, region, customer type, time.

In the simplest case, a cube contains two dimensions and can be represented as a two-dimensional table, for example, it includes sales data for different products for different time periods. In the case of three dimensions, the cube can be represented graphically, as shown in Fig. 3.4. Strictly speaking, from the point of view of mathematics, such an array will not always be a cube, since the number of elements in different dimensions, which are "sides" of the cube, may not be the same - an OLAP cube has no such restrictions.

Rice. 3.4.

In addition to the fact that the cube can contain an unlimited number of dimensions, the contents of the cell can become more complicated - the cube can have several facts, for example, not only the number of goods sold, but their cost, the balance in the warehouse. In this case, the cell will display multiple values.

If a three-dimensional cube can be represented graphically, then a cube with more than three dimensions can no longer be visualized. Therefore, in reality, cube slices are used for analysis. is the result of fetching cube data by user-selected dimension values, which are called labels (members). For example, an analyst wants to compare sales of three product groups in Moscow and St. Petersburg for January and February. In this case, he must arrange the values of the "Product" dimension by rows, the values of the "City" and "Time" dimensions - by columns and select the positions of interest to him in the dimensions. The slice of the cube will have the form shown in Fig. 3.5.

Rice. 3.5.

It is possible that the analyst needs to obtain data corresponding to one dimension value or for all dimension values in general. In this case, this measurement is called fixed, it is not arranged in rows or columns, but acts as a report parameter (Fig. 3.6).

Rice. 3.6.

Some measurements may have multiple levels. For example, a year is divided into quarters, quarters into months, months into weeks, weeks into days; the country consists of regions, regions - from settlements, in cities it is possible to single out districts and specific retail outlets; goods can be grouped into product groups. In OLAP terms, such multilevel joins are called hierarchies. Hierarchical dimensions allow you to analyze information at different levels of aggregation. For example, an analyst might compare total annual sales and then "go down" to the quarter level to see how sales have changed by quarter.

OLAP provides a convenient and beautiful shell for navigating through multidimensional data. The proposed interface introduces the following basic operations:

turn- transposition, as a result of which the rows and columns of the table are reversed;
projection - aggregation of values in cells lying on the projection axis according to a certain law (summation, finding the average, determining the number of non-empty cells, etc.);
disclosure or detail(drill-down), - replacement of one of the measurement values with a set of values from the next level of the measurement hierarchy;
convolution, or consolidation(roll-up/drill-up), - the reverse operation of opening;
section(slice-and-dice) - obtaining a "slice" of data by setting the parameters of their selection from the cube.

In general, the OLAP algorithm includes the following steps:

obtaining data in the form of a flat table or the result of an SQL query;
saving data to random access memory and converting them to a multidimensional cube;
displaying the constructed cube using a crosstab or chart. In general, an arbitrary number of mappings can be connected to one cube.

For the first time, the definition of OLAP technology was given by E. Codd in 1993. Codd described the possibilities of multidimensional analysis and formulated 12 OLAP rules, to which a few more were added a little later (in 1995). Let's consider them in more detail.

1. Multidimensional conceptual representation of data(Multi-Dimensional Conceptual View). An OLAP product uses a multidimensional data representation model, where categorical data attributes are treated as dimensions and quantitative data attributes are treated as facts.
2. Transparency(Transparency). It should be hidden from the user how the multidimensional model is implemented, what specific means are used to store and process data, how the data is organized and where it comes from.
3. Availability(Accessibility). The OLAP toolkit must provide the user with access to data, regardless of their location and storage method. At the same time, a single, consistent and consistent data model should be maintained.
4. Sustainable performance(Consistent Reporting Performance). High OLAP performance must be ensured regardless of the number of dimensions of the multidimensional model and the size of the database.
5. Client-server architecture(Client-Server Architecture). To ensure operational analytical processing of distributed data, an OLAP product must work on the basis of a client-server architecture. To generalize and consolidate data from various physically separated corporate databases, the tool must support the construction of a common conceptual data schema.
6. Equality of measurements(Generic Dimensionality). All dimensions in a multidimensional cube must have the same set of features available. If necessary, additional characteristics can be added to any dimension. The underlying data structure, calculation formulas, and reporting formats should not be tied to any one dimension.
7. Dynamic handling of sparse matrices(Dynamic Sparse Matrix Handling). Since crosstabs generated by an OLAP tool are often sparse, they must be processed optimally. The tool must provide high processing speed regardless of the location of the data cells, the number of dimensions in the cube and the sparseness of the data.
8. Multiplayer support(Multi-User Support). An OLAP tool should allow several users to work with the same data at the same time and ensure data integrity and protection.
9. Unlimited support for cross-dimensional operations(Unrestricted Crossdimensional Operations). When performing data manipulations (slicing, rotating, consolidating, detailing operations), the functional relationships between the cells of a multidimensional cube, described using formulas, must be preserved. Transformations of the established relations must be performed by the system independently, without the need to redefine them by the user.
10. Intuitive data manipulation(Intuitive Data Manipulation). User interface to perform data manipulation should be as convenient, natural and comfortable as possible.

AND. Flexible reporting mechanism(Flexible Reporting). An OLAP tool must support various ways of visualizing data (tables, graphs, maps) in any possible orientation.

12. Unlimited dimensions and aggregation levels(Unlimited Dimensions and Aggregation Levels). An OLAP tool must support an analytical data model that can contain up to 20 dimensions. At the same time, the tool should allow the user to define for each dimension an unlimited number of aggregation levels in any direction of consolidation.

To define OLAP as an analytical tool, the FASMI (Fast Analysis of Shared Multidimensional Information) test is used as a universal criterion. Let's consider in detail each of the components of this abbreviation.

Fast(fast). User queries should be processed by the OLAP system at a high speed, while the average query processing time should not exceed 5 s, most queries should be processed within 1 s, the most complex queries that require large calculations should be processed no more than 20 s.

Analysis(analysis). An OLAP tool must provide the user with statistical analysis tools and ensure that the results are saved in a form accessible to end user. Analysis tools may include procedures for analyzing time series, analysis of variance, calculating growth and growth rates, calculating structural indicators, converting to various units of measurement, etc.

shared(shared access). An OLAP tool must be able to work in multi-user mode.

Multidimensional(multidimensional). An OLAP application must provide a multidimensional view of data that supports hierarchical dimensions.

Information(information). An OLAP tool must provide the user with access to information, regardless of the electronic data warehouse in which it is located.

Depending on the answer to the question, does a multidimensional cube exist as a separate physical structure or only as a virtual data model, distinguish between MOLAP (Multidimensional OLAP) and ROLAP (Relational OLAP) systems. MOLAP implements a multidimensional representation of data at the physical level in the form of multidimensional cubes. ROLAP systems use the classic relational model that is common to OLTP systems. In this case, the data is stored in relational tables, but special structures emulate their multidimensional representation. There are also hybrid OLAPs (HOLAP - Hybrid OLAP), in which detailed data is stored in relational tables, and aggregated data is stored in multidimensional cubes. This combination of relational and multidimensional models allows you to combine the high performance characteristic of the multidimensional model and the ability to store arbitrarily large data arrays inherent in the relational model.

Codd E. Providing OLAP to User-Analysts: An IT Mandate // Computerworld. 1993. T. 27. No. 30.

8.3.1. On-Line Analytical Processing (OLAP) Tools

On-Line Analytical Processing - means of operational (real-time) analytical processing of information aimed at decision support and helping analysts answer the question "Why are objects, environments and the results of their interaction such and not others?". At the same time, the analyst himself forms versions of the relationships between a lot of information and checks them based on the available data in the corresponding structured information databases.

ERP systems are characterized by the presence of analytical components as part of functional subsystems. They provide the formation of analytical information in real time. This information is the basis of most management decisions.

OLAP technologies use hypercubes - specially structured data (otherwise called OLAP cubes). In the hypercube data structure, there are:

Measures - quantitative indicators (details-bases) used to form summary statistical results;

Dimensions - descriptive categories (details-features), in the context of which the measures are analyzed.

The dimension of a hypercube is determined by the number of dimensions for one measure. For example, the SALES hypercube contains data:

Measurements: consumers, dates of operations, groups of goods, nomenclature, modifications, packaging, warehouses, types of payment, types of shipment, tariffs, currency, organizations, divisions, responsible, distribution channels, regions, cities;

Measures: planned quantity, actual quantity, planned amount, actual amount, planned payments, actual payments, planned balance, actual balance, sales price, order execution time, refund amount.

Such a hypercube is intended for analytical reports:

Classification of consumers by volume of purchases;

Classification of goods sold according to the ABC method;

Analysis of the terms of execution of orders of various consumers;

Analysis of sales volumes by periods, products and product groups, regions and consumers, internal divisions, managers and distribution channels;

Forecast of mutual settlements with consumers;

Analysis of the return of goods from consumers; etc.

Analytical reports can have an arbitrary combination of dimensions and measures; they are used to analyze management decisions. Analytical processing is provided by instrumental and linguistic means. The information technology "Pivot Tables" is presented in the public MS Excel spreadsheet, the initial data for their creation are:

List (database) MS Excel - relational table;

Another MS Excel PivotTable;

Consolidated range of MS Excel cells located in the same or different workbooks;

External relational database or OLAP cube, data source (.dsn, .ode format files).

To build pivot tables based on external databases, ODBC drivers are used, as well as the MS Query program. The pivot table for the source MS Excel database has the following structure (Fig. 8.3).

The layout of the pivot table has the following data structure (Fig. 8.4): dimensions - department code, position; measures - length of service, salary and bonus. Below is a summary table. 8.2, which allows you to analyze the relationship between average work experience and salary, average work experience and bonuses, salary and bonuses.

Table 8.2

Pivot table for link analysis

The end of the table. 8.2

To continue the analysis using the pivot table, you can:

Add new totals (for example, average salary, average bonus amount, etc.);

Use filtering of records and totals of the pivot table (for example, on the basis of "Gender", which is placed in the layout in the area * Page ");

Calculate structural indicators (for example, the distribution of wage funds and the bonus fund by departments - using the means additional processing pivot tables, shares of the sum by column); etc.

A set of MS Office programs allows you to publish data spreadsheets, including pivot tables and charts in XML format.

Component Microsoft office Web Components supports working with published data in the environment Internet Explorer, providing continuation of the analysis (changes in the data structure of the pivot table, calculation of new summary totals).

8.3.2. Data Mining (DM) tools

DM tools involve extracting (“digging”, “mining”) data and are aimed at identifying relationships between information stored in digital databases of an enterprise, which an analyst can use to build models that allow him to quantify the degree of influence of factors of interest to him. In addition, such tools can be useful for building hypotheses about the possible nature of information relationships in enterprise digital databases.

Text Mining (TM) technology is a set of tools that allows you to analyze large sets of information in search of trends, patterns and relationships that can help make strategic decisions.

Image Mining (IM) technology contains tools for recognition and classification of various visual images stored in enterprise databases or obtained as a result of online search from external information sources.

To solve the problems of processing and storing all data, the following approaches are used:

1) the creation of several backup systems or one distributed workflow system that allows you to save data, but have slow access to stored information at the request of the user;

2) building Internet systems that are highly flexible, but not adapted for the implementation of search and storage text documents;

3) the introduction of Internet portals that are well oriented to user requests, but do not have descriptive information about the text data loaded into them.

Processing systems text information, free from the problems listed above, can be divided into two categories: linguistic analysis systems and text data analysis systems.

The main elements of Text Mining technology are:

summarization;

Thematic search (feature extraction);

Clustering;

Classification (classification);

Answering requests (question answering);

Thematic indexing;

Search by keywords(keyword searching);

Creation and maintenance of oftaxonomy (oftaxonomies) and thesauri (thesauri).

Software products that implement Text Mining technology include:

IBM Intelligent Miner for Text - a set of individual utilities launched from command line, or skipts; independent of each other (the main emphasis is on data mining mechanisms - information retrieval);

Oracle InterMedia Text - a set integrated into the DBMS that allows you to work most efficiently with user requests (allows you to work with modern relational DBMS in the context of complex multi-purpose search and analysis of text data);

Megaputer Text Analyst is a set of objects built into the COM program designed to solve Text Mining problems.

8.3.3. Intelligent Information Technology

Today, in the field of management automation, information analysis dominates at the preliminary stage of decision preparation - processing of primary information, decomposition of a problem situation, which makes it possible to learn only fragments and details of processes, and not the situation as a whole. To overcome this shortcoming, it is necessary to learn how to build knowledge bases using the experience of the best specialists, as well as generate the missing knowledge.

The use of information technologies in various spheres of human activity, the exponential growth of information volumes and the need to respond quickly in any situation required the search for adequate ways to solve emerging problems. The most effective of them is the way of intellectualization of information technologies.

Under intelligent information technologies(ITT) usually understand such information technologies, which provide the following features:

The presence of knowledge bases that reflect the experience of specific people, groups, societies, humanity as a whole, when solving creative problems in certain areas of activity, traditionally considered the prerogative of the human intellect (for example, such poorly formalized tasks as decision making, design, extraction of meaning, explanation, training, etc.);

The presence of thinking models based on knowledge bases: rules and logical conclusions, argumentation and reasoning, recognition and classification of situations, generalization and understanding, etc.;

The ability to form quite clear decisions based on fuzzy, non-strict, incomplete, underdetermined data;

Ability to explain conclusions and decisions, i.e. the presence of an explanation mechanism;

The ability to learn, retrain and, consequently, to develop.

Technologies of informal search for hidden patterns in data and information Knowledge Discovery (KD) are based on the latest technologies formation and structuring of information images of objects, which is closest to the principles of information processing by intelligent systems.

Decision Support (DS) decision support information technologies are shells of expert

systems or specialized expert systems that enable analysts to determine the relationships and relationships between information structures in enterprise structured information bases, as well as to predict the possible outcomes of decision making.

IIT development trends. Systems of communication and communications. Global information networks and IIT can fundamentally change the way we think about companies and knowledge work itself. The presence of employees at the workplace will become almost unnecessary. People can work from home and interact with each other when needed through networks. Known, for example, is the successful experience of creating a new modification of the Boeing 747 aircraft by a distributed team of specialists interacting via the Internet. The location of participants in any development will play an ever smaller role, but the importance of the skill level of participants will increase. Another reason that determined the rapid development of IIT is related to the complication of communication systems and the tasks solved on their basis. A qualitatively new level of "intellectualization" of such software products as systems for analyzing heterogeneous and non-rigorous data, ensuring information security, developing solutions in distributed systems etc.

Education. Distance learning is starting to play today important role in education, and the introduction of IIT will significantly individualize this process in accordance with the needs and abilities of each student.

Life. Informatization of everyday life has already begun, but with the development of IIT, fundamentally new opportunities will appear. Gradually, more and more new functions will be transferred to the computer: control over the user's health, control of household appliances such as humidifiers, air fresheners, heaters, ionizers, music centers, medical diagnostic tools, etc. In other words, the systems will also become diagnosticians of the state of a person and his home. A comfortable information space will be provided in the premises, where the information environment will become part of the human environment.

Prospects for the development of IIT. It seems that at present IIT has approached a fundamentally new stage in its development. So, over the past 10 years, the capabilities of IIT have significantly expanded due to the development of new types logical models, appearance but-

vyh theories and ideas. The key points in the development of IIT are:

The transition from logical inference to models of argumentation and reasoning;

Search for relevant knowledge and generation of explanations;

Understanding and synthesis of texts;

Cognitive graphics, i.e. graphic and figurative representation of knowledge;

Multi-agent systems;

Intelligent network models;

Calculations based on fuzzy logic, neural networks, genetic algorithms, probabilistic calculations (implemented in various combinations with each other and with expert systems);

The problem of metaknowledge.

Multi-agent systems have become a new paradigm for creating promising IITs. It is assumed here that the agent is an independent intellectual system that has its own system of goal-setting and motivation, its own area of action and responsibility. Interaction between agents is provided by the system high level- metaintelligence. In multi-agent systems, a virtual community of intelligent agents is modeled - objects that are autonomous, active, enter into various social relations - cooperation and cooperation (friendship), competition, competition, enmity, etc. The social aspect of solving modern problems is the fundamental feature of the conceptual novelty of advanced intellectual technologies - virtual organizations, virtual society.

(?) Control questions and tasks

1. Give a description of the enterprise as an object of informatization. What are the main indicators characterizing the development of the enterprise management system.

2. List the leading information technologies for industrial enterprise management.

3. Name the main information technologies of organizational and strategic development of enterprises (corporations).

4. What are the foundations of strategic management standards aimed at improving business processes? What is the ratio of information technology BPM and BPI?

5. Define the philosophy of total quality management (TQM). How are the phases of quality development and information technology related?

6. Name the main provisions of the organizational development of the enterprise, describe the stages of strategic management. Name the group strategies.

7. How is the business model of the enterprise created? What are the main approaches to evaluating the effectiveness of a business model.

8. What is a balanced scorecard? Name the main components of the SSP. What are the relationships between groups of BSC indicators?

9. List the methodological foundations for creating information systems. What is a systems approach?

10. What is an information approach to the formation of information systems and technologies?

11. What is a strategic approach to the formation of information systems and technologies?

12. What is the content of an object-oriented approach to describing the behavior of agents in the market? Give the definition of the object, indicate analogues of agent systems.

13. What are the methodological principles for improving enterprise management based on information and communication technologies? What is the purpose of ICT?

14. Give definitions of a document, document flow, workflow, document management system.

15. How is the document form layout designed? Name the zones of the document, the composition of their details.

16. Name the basic information technologies of the document management system.

17. What is a unified documentation system? What are general principles unification?

18. Describe organizational and administrative documentation, give examples of documents.

19. What requirements must be met electronic system document management?

20. What is a corporate information system? Name the main control loops, the composition of functional modules.

21. Name the ones you know software products for KIS. Give them a comparative description.

SH Literature

1. Vernet J., Moriarty S. Marketing communications. Integrated approach. St. Petersburg; Kharkov: Peter, 2001.

2. Brooking E. Intellectual capital. The key to success in the new millennium. St. Petersburg: Peter, 2001.

3. Godin V.V., Korpev I.K. Management of information resources. M.: INFRA-M, 1999.

4. Information Systems and technology in economics: Textbook. 2nd ed., add. and reworked. / M.I. Semenov, I.T. Trubilin, V.I. Loiko, T.P. Baranovskaya; Ed. IN AND. Loiko. M.: Finance and statistics, 2003.

5. Information technology in business / Ed. M. Zheleny. St. Petersburg: Peter, 2002.

6. Kaplan Robert S., Norton David P. Balanced Scorecard. From strategy to action / Per. from English. M.: CJSC "Olimp-Business", 2003.

7. Karagodin V.I., Karagodina BJI. Information is the basis of life. Dubna: Phoenix, 2000.

8. Karminsky AM., Nesterov PZ. Business informatization. M.: Finance and statistics, 1997.

9. Likhacheva T.N. Information technology at the service information society// New information technologies in economic systems. M., 1999.

10. Ostreykovsky V.A. Theory of systems. Moscow: Higher school, 1997.

11. Piterkin S.V., Oladov N.A., Isaev D.V. Just in time for Russia. The practice of using ERP-systems. 2nd ed. Moscow: Alpina Publisher, 2003.

12. Sokolov D.V. Introduction to the theory of social communication: Proc. allowance. St. Petersburg: Publishing house SP6GUP, 1996.

13. Trofimov V.Z., Tomilov V.Z. Information and communication technologies in management: Proc. allowance. St. Petersburg: SPbGUEF, 2002.

The current level of development of hardware and software has made it possible for some time now to maintain databases of operational information at various levels of management. In the course of their activities, industrial enterprises, corporations, departmental structures, public authorities and administrations have accumulated large amounts of data. They contain great potential for extracting useful analytical information, on the basis of which you can identify hidden trends, build a development strategy, and find new solutions.

In recent years, a number of new concepts for storing and analyzing corporate data have taken shape in the world:

1) Data Warehouses, or Data Warehouses (Data Warehouse)

2) Online analytical processing (On-Line Analytical Processing, OLAP)

3) Data mining - IAD (Data Mining)

OLAP analytical data processing systems are decision support systems focused on executing more complex queries that require statistical processing of historical data accumulated over a certain period of time. They serve to prepare business reports on sales, marketing for management purposes, the so-called Data Mining - data mining, i.e. a way to analyze information in a database to find anomalies and trends without finding out the semantic meaning of the records.

Analytical systems built on the basis of OLAP include information processing tools based on artificial intelligence methods and data graphical presentation tools. These systems are determined by a large amount of historical data, allowing you to extract meaningful information from them, i.e. get knowledge from data.

Efficiency of processing is achieved through the use of powerful multiprocessor technology, complex analysis methods, specialized data storages.

Relational databases store entities in separate tables, which are usually well normalized. This structure is convenient for operational databases (OLTP systems), but complex multi-table queries are relatively slow in it. A better model for querying rather than modifying is a spatial database.

An OLAP system takes a snapshot of a relational database and structures it into a spatial model for queries. The claimed processing time for queries in OLAP is about 0.1% of similar queries in a relational database.

An OLAP structure created from production data is called an OLAP cube. A cube is created from joining tables using a star schema. In the center of the "star" is a fact table containing the key facts on which queries are made. Multiple tables with dimensions are attached to a fact table. These tables show how aggregated relational data can be analyzed. The number of possible aggregations is determined by the number of ways in which the original data can be displayed hierarchically.

The given classes of systems (OLAP and OLTP) are based on the use of a DBMS, but the types of queries are very different. The OLAP mechanism is one of the most popular data analysis methods today. There are two main approaches to solving this problem. The first of them is called Multidimensional OLAP (MOLAP) - the implementation of the mechanism using a multidimensional database on the server side, and the second Relational OLAP (ROLAP) - building cubes on the fly based on SQL queries to a relational DBMS. Each of these approaches has its own advantages and disadvantages. The general scheme of the desktop OLAP systems can be shown in Fig.

The algorithm of work is the following:

1) obtaining data in the form of a flat table or the result of executing an SQL query;

2) caching data and converting them to a multidimensional cube;

3) displaying the constructed cube using a cross-tab or chart, etc.

In general, an arbitrary number of mappings can be connected to one cube. Displays used in OLAP systems are most often of two types: cross tables and charts.

Star diagram. Its idea is that there are tables for each dimension, and all the facts are placed in one table, indexed by a multiple key made up of the keys of individual dimensions. Each ray of the star scheme sets, in Codd's terminology, the direction of data consolidation along the corresponding dimension.

In complex tasks with multilevel measurements, it makes sense to refer to the star schema extensions - the constellation schema (fact constellation schema) and the snowflake schema (snowflake schema). In these cases, separate fact tables are created for possible combinations of summarization levels of different dimensions. This allows for better performance, but often leads to redundant data and significant complexity in the database structure, which contains a huge number of fact tables.

constellation diagram