04.01.2021

Big Data: analytics and solutions. Big data Big data examples


Only the lazy ones don't talk about Big data, but they hardly understand what it is and how it works. Let's start with the simplest - terminology. Speaking in Russian, Big data is a variety of tools, approaches and methods of processing both structured and unstructured data in order to use them for specific tasks and purposes.

Unstructured data is information that has no predefined structure or is not organized in a specific order.

The term "big data" was coined by the editor of the journal Nature Clifford Lynch back in 2008 in a special issue devoted to the explosive growth of the world's volumes of information. Although, of course, big data itself existed before. According to experts, the majority of data streams over 100 GB per day belong to the Big data category.

Read also:

Today, this simple term hides only two words - data storage and processing.

Big data - in simple words

In the modern world, Big data is a socio-economic phenomenon, which is associated with the fact that new technological opportunities have appeared for analyzing a huge amount of data.

Read also:

For ease of understanding, imagine a supermarket in which all the goods are not in your usual order. Bread next to fruit, tomato paste next to frozen pizza, lighter in front of a tampon rack containing avocados, tofu or shiitake mushrooms, among others. Big data puts everything in its place and helps you find nut milk, find out the cost and expiration date, and also who, besides you, buys such milk and why is it better than cow's milk.

Kenneth Kukier: Big data is the best data

Big data technology

Huge amounts of data are processed so that a person can get specific and necessary results for their further effective use.

Read also:

In fact, Big data is a problem solving and alternative to traditional data management systems.

Techniques and methods of analysis applicable to Big data according to McKinsey:

  • Data Mining;
  • Crowdsourcing;
  • Data mixing and integration;
  • Machine learning;
  • Artificial neural networks;
  • Pattern recognition;
  • Predictive analytics;
  • Simulation modeling;
  • Spatial analysis;
  • Statistical analysis;
  • Analytical data visualization.

Horizontal scalability that enables data processing is a fundamental principle of big data processing. Data is distributed to computational nodes, and processing occurs without degradation of performance. McKinsey also included relational management systems and Business Intelligence in the context of applicability.

Technologies:

  • NoSQL;
  • MapReduce;
  • Hadoop;
  • Hardware solutions.

Read also:

For big data, the traditional defining characteristics developed by the Meta Group back in 2001 are distinguished, which are called “ Three V»:

  1. Volume- the size of the physical volume.
  2. Velocity- the speed of growth and the need for fast data processing to obtain results.
  3. Variety- the ability to simultaneously process various types of data.

Big data: applications and opportunities

It is impossible to process volumes of heterogeneous and rapidly arriving digital information with traditional tools. The analysis of the data itself allows you to see certain and imperceptible patterns that a person cannot see. This allows us to optimize all areas of our life - from government to manufacturing and telecommunications.

For example, some companies a few years ago protected their clients from fraud, and taking care of the client's money was taking care of their own money.

Susan Etleiger: What about Big Data?

Big data-based solutions: Sberbank, Beeline and other companies

Beeline has a huge amount of data about subscribers, which they use not only to work with them, but also to create analytical products such as external consulting or IPTV analytics. Beeline segmented the database and protected customers from money fraud and viruses, using HDFS and Apache Spark for storage, and Rapidminer and Python for data processing.

Read also:

Or remember Sberbank with their old case called AS SAFI. It is a system that analyzes photographs to identify bank customers and prevents fraud. The system was introduced back in 2014, at the heart of the system is a comparison of photographs from the database, which get there from webcams on racks thanks to computer vision. The basis of the system is a biometric platform. Thanks to this, the cases of fraud have decreased by 10 times.

Big data in the world

By 2020, according to forecasts, humanity will generate 40-44 zettabytes of information. And by 2025 it will grow 10 times, according to the report The Data Age 2025, which was prepared by analysts at IDC. The report notes that most of the data will be generated by businesses themselves, not consumers.

Research analysts believe that data will become a vital asset and security will become a critical foundation in life. The authors of the work are also confident that the technology will change the economic landscape, and the average user will communicate with connected devices about 4800 times a day.

Big data market in Russia

In 2017, global revenue in the big data market should reach $ 150.8 billion, which is 12.4% more than last year. Globally, the Russian market for big data services and technologies is still very small. In 2014, the American company IDC estimated it at $ 340 million. In Russia, the technology is used in banking, energy, logistics, public sector, telecom and industry.

Read also:

As for the data market, it is just emerging in Russia. Within the RTB ecosystem, data providers are the owners of programmatic data management platforms (DMPs) and data exchanges. Telecom operators are in a pilot mode sharing consumer information about potential borrowers with banks.

Typically, big data comes from three sources:

  • Internet (social networks, forums, blogs, media and other sites);
  • Corporate archives of documents;
  • Readings from sensors, instruments and other devices.

Big data in banks

In addition to the system described above, in the strategy of Sberbank for 2014-2018. talks about the importance of analyzing massive amounts of data for quality customer service, risk management and cost optimization. Now the bank uses Big data for risk management, combating fraud, segmentation and assessment of customer creditworthiness, personnel management, forecasting queues in branches, calculating bonuses for employees and other tasks.

VTB24 uses big data to segment and manage customer churn, generate financial statements, analyze reviews in social networks and forums. To do this, he uses Teradata, SAS Visual Analytics and SAS Marketing Optimizer solutions.

Based on materials from research & trends

Big Data, "Big Data" have become the talk of the town in the IT and marketing press for several years now. And it is understandable: digital technologies have permeated the life of a modern person, “everything is written”. The volume of data on the most diverse aspects of life is growing, and at the same time, the possibilities for storing information are growing.

Global technologies for storing information

Source: Hilbert and Lopez, `The world 's technological capacity to store, communicate, and compute information,` Science, 2011 Global.

Most experts agree that the acceleration of data growth is an objective reality. Social networks, mobile devices, data from measuring devices, business information are just a few types of sources that can generate gigantic amounts of information. According to research IDCDigital universe published in 2012, the next 8 years the amount of data in the world will reach 40 Zb (zettabytes), which is equivalent to 5200 GB for every inhabitant of the planet.

Growth of Collected Digital Information in the United States


Source: IDC

A significant part of information is created not by people, but by robots that interact both with each other and with other data networks, such as, for example, sensors and smart devices. With such growth rates, the amount of data in the world, according to the forecasts of researchers, will double every year. The number of virtual and physical servers in the world will grow tenfold due to the expansion and creation of new data centers. In this regard, there is a growing need for effective use and monetization of this data. Since the use of Big Data in business requires a lot of investment, you need to clearly understand the situation. And it is, in essence, simple: you can increase the efficiency of your business by reducing costs and / and increasing sales.

What is Big Data for?

The Big Data paradigm defines three main types of tasks.

  • Storing and managing hundreds of terabytes or petabytes of data that conventional relational databases cannot efficiently use.
  • Organizing unstructured information consisting of text, images, videos, and other types of data.
  • Big Data analysis, which raises the question of how to work with unstructured information, generate analytical reports, and implement predictive models.

The market for Big Data projects overlaps with the business intelligence (BA) market, the volume of which in the world, according to experts, in 2012 amounted to about $ 100 billion. It includes components of networking technology, servers, software, and technical services.

Also, the use of Big Data technologies is relevant for solutions of the income guarantee (RA) class, designed to automate the activities of companies. Modern income guarantee systems include tools for detecting inconsistencies and in-depth data analysis, which allow timely detection of possible losses or distortion of information that can lead to a decrease in financial results. On this background Russian companies confirming the presence of demand for Big Data technologies in the domestic market, note that the factors that stimulate the development of Big Data in Russia are the growth of data, the acceleration of managerial decision-making and an increase in their quality.

What prevents working with Big Data

Today, only 0.5% of the accumulated digital data is analyzed, despite the fact that there are objectively industry-wide tasks that could be solved using analytical solutions of the Big Data class. Developed IT markets already have results that can be used to assess the expectations associated with the accumulation and processing of big data.

One of the main factors that hinders the implementation of Big Data - projects, in addition to the high cost, is considered the problem of choosing the data to be processed: that is, determining which data needs to be retrieved, stored and analyzed, and which should not be taken into account.

Many business representatives note that difficulties in implementing Big Data projects are associated with a shortage of specialists - marketers and analysts. The rate of return on investments in Big Data directly depends on the quality of work of employees engaged in deep and predictive analytics. The tremendous potential of data already existing in an organization often cannot be effectively used by marketers themselves due to outdated business processes or internal regulations. Therefore, Big Data projects are often perceived by businesses as difficult not only in implementation, but also in assessing the results: the value of the collected data. The specifics of working with data requires marketers and analysts to switch their attention from technology and create reports to solve specific business problems.

Due to the large volume and high speed of data flow, the process of data collection involves ETL procedures in real time. For reference:ETL - fromEnglishExtract, Transform, Load- literally "extraction, transformation, loading") - one of the main processes in management data warehouses, which includes: extracting data from external sources, transforming them and cleaning to fit needs ETL should be seen not only as a process of transferring data from one application to another, but also as a tool for preparing data for analysis.

And then the issues of ensuring the security of data coming from external sources should have solutions corresponding to the amount of information collected. Since Big Data analysis methods are developing so far only following the growth of data volume, an important role is played by the property of analytical platforms to use new methods of data preparation and aggregation. This suggests that, for example, data on potential customers or a massive data store with a history of clicks on online store sites can be interesting for solving various problems.

Difficulties do not stop

Despite all the difficulties with the implementation of Big Data, the business intends to increase investments in this area. As it follows from the data of Gartner, in 2013 64% of the world's largest companies have already invested, or have plans to invest in the deployment of technologies in the field of Big Data for their business, whereas in 2012 there were 58% of them. According to a Gartner study, the leaders of the industries investing in Big Data are media companies, telecoms, banking and service companies. Successful results of Big Data implementation have already been achieved by many major players in the retail sector in terms of using data obtained using RFID tools, logistics and replication systems (from the English. replenishment- accumulation, replenishment - R&T), as well as from loyalty programs. Successful retail experiences stimulate other industries in the market to find new and effective ways to monetize big data in order to turn their analysis into a resource for business development. Thanks to this, according to experts' forecasts, in the period until 2020, investments in management and storage will decrease per gigabyte of data from $ 2 to $ 0.2, but for the study and analysis of the technological properties of Big Data will grow by only 40%.

The costs presented in various investment projects in the field of Big Data are of a different nature. Cost items depend on the types of products that are selected based on certain decisions. According to experts, the largest part of the costs in investment projects falls on products related to collecting, structuring data, cleaning and managing information.

How it's done

There are many combinations of software and hardware that allow you to create effective Big Data solutions for various business disciplines: from social media and mobile applications to business data mining and visualization. An important advantage of Big Data is the compatibility of new tools with databases widely used in business, which is especially important when working with cross-disciplinary projects, for example, such as organizing multi-channel sales and customer support.

The sequence of working with Big Data consists of collecting data, structuring the information received using reports and dashboards, creating insights and contexts, and formulating recommendations for action. Since working with Big Data implies high costs for collecting data, the result of which is not known in advance, the main task is to clearly understand what the data is for, and not how much of it is available. In this case, the collection of data turns into a process of obtaining information that is extremely necessary for solving specific problems.

For example, telecommunications providers aggregate a huge amount of data, including geolocation data, which is constantly updated. This information may be of commercial interest to advertising agencies that may use it to serve targeted and local advertisements, as well as retailers and banks. Such data can play an important role in deciding the opening of a retail outlet in a certain location based on data on the presence of a powerful target flow of people. There is an example of measuring the effectiveness of advertising on outdoor billboards in London. Now the reach of such advertising can be measured only by placing people with a special device that counts passers-by near the advertising structures. Compared to this type of measurement of advertising effectiveness, mobile operator much more opportunities - he knows exactly the location of his subscribers, he knows their demographic characteristics, gender, age, marital status, etc.

Based on such data, in the future, the prospect opens up to change the content of the advertising message, using the preferences of a particular person passing by the billboard. If the data shows that the person passing by travels a lot, then they can be shown an advertisement for the resort. The organizers of a football match can estimate the number of fans only when they come to the match. But if they had the opportunity to ask the operator cellular communication information on where the visitors were an hour, day or month before the match, this would give the organizers the opportunity to plan places for advertising the next matches.

Another example is how banks can use Big Data to prevent fraud. If the client claims to have lost the card, and when making a purchase with it, the bank sees in real time the location of the client's phone in the purchase area where the transaction is taking place, the bank can check the information at the client's request to see if he tried to cheat him. Or the opposite situation, when a customer makes a purchase in a store, the bank sees that the card used for the transaction and the customer's phone are in the same place, the bank can conclude that the card is used by its owner. Thanks to these advantages of Big Data, the boundaries that traditional data warehouses are endowed with are expanding.

To successfully make a decision on the implementation of Big Data solutions, a company needs to calculate an investment case and this causes great difficulties due to many unknown components. In such cases, the paradox of analytics is forecasting the future based on the past, for which data are often lacking. In this case, a clear planning of your initial actions is an important factor:

  • Firstly, it is necessary to determine one specific business problem, for the solution of which Big Data technologies will be used, this task will become the core of determining the correctness of the chosen concept. You need to focus on collecting the data related to that particular task, and the proof-of-concept will enable you to use a variety of tools, processes, and management techniques to help you make better decisions in the future.
  • Second, it is unlikely that a company without data analytics skills and experience will be able to successfully implement a Big Data project. The necessary knowledge always comes from previous experience of analytics, which is the main factor affecting the quality of working with data. Important role The culture of data use plays a role, as often the analysis of information reveals the harsh truth about business, and in order to accept and work with this truth, developed methods of working with data are needed.
  • Third, the value of Big Data technologies lies in providing insights. Good analysts remain in short supply in the market. It is customary to refer to them as specialists who have a deep understanding of the commercial meaning of data and know how to use it correctly. Data analysis is a means to achieve business goals, and in order to understand the value of Big Data, you need an appropriate model of behavior and understanding of your actions. In this case, big data will give a lot useful information about consumers, on the basis of which you can make decisions that are useful for your business.

Despite the fact that the Russian Big Data market is just beginning to form, individual projects in this area are already being implemented quite successfully. Some of them are successful in the field of data collection, such as projects for the Federal Tax Service and Tinkoff Credit Systems Bank, others in terms of data analysis and practical application of its results: this is the Synqera project.

Tinkoff Credit Systems Bank implemented a project to implement the EMC2 Greenplum platform, which is a tool for massively parallel computing. Over the past years, the bank has increased requirements for the speed of processing accumulated information and data analysis in real time, caused by the high growth rate of the number of users credit cards... The bank announced plans to expand the use of Big Data technologies, in particular for processing unstructured data and working with corporate information obtained from various sources.

The Federal Tax Service of Russia is currently creating an analytical layer for the federal data warehouse. On its basis, a single information space and technology of access to tax data for statistical and analytical processing... During the implementation of the project, work is being carried out to centralize analytical information with more than 1200 sources of the local level of the Federal Tax Service Inspectorate.

Another interesting example of real-time big data analysis is the Russian startup Synqera, which developed the Simplate platform. The solution is based on processing large amounts of data; the program analyzes information about customers, their purchase history, age, gender and even mood. At the checkout counters in the chain of cosmetic stores, touch screens with sensors that recognize the emotions of customers were installed. The program detects a person's mood, analyzes information about him, determines the time of day and scans the store's discount database, after which it sends the buyer targeted messages about promotions and special offers. This solution increases customer loyalty and increases retailer sales.

If we talk about successful foreign cases, then in this regard, the experience of using Big Data technologies at Dunkin`Donuts, which uses real-time data to sell products, is interesting. Digital displays in stores display offers that alternate every minute, depending on the time of day and product availability. The company receives data from the cashier's receipts which offers have received the greatest response from buyers. This approach to data processing allowed to increase profits and turnover of goods in the warehouse.

As the experience of implementing Big Data projects shows, this area is designed to successfully solve modern business problems. At the same time, an important factor in achieving commercial goals when working with big data is choosing the right strategy, which includes analytics that identify consumer demands, as well as the use of innovative technologies in the field of Big Data.

According to a global survey conducted annually by Econsultancy and Adobe among company marketers since 2012, “big data” about how people behave on the Internet can do a lot. They are able to optimize offline business processes, help to understand how the owners of mobile devices use them to find information, or simply “make marketing better”, ie. more efficient. Moreover, the last function is more and more popular from year to year, as follows from the diagram presented by us.

Key areas of work for internet marketers in terms of customer relationships


A source: Econsultancy and Adobe, published- emarketer.com

Note that the nationality of the respondents does not matter much. As a survey conducted by KPMG in 2013 shows, the share of “optimists”, i.e. of those who use Big Data in developing a business strategy is 56%, and the fluctuations from region to region are small: from 63% in North American countries to 50% in EMEA.

Using Big Data in different regions of the world


A source: KPMG, published- emarketer.com

Meanwhile, the attitude of marketers to such "fashion trends" is somewhat reminiscent of the well-known anecdote:

Tell me, Vano, do you like tomatoes?
- I like to eat, but I don't.

Despite the fact that marketers in words “love” Big Data and seem to even use it, in fact, “everything is complicated,” as they write about their heartfelt affections on social networks.

According to a survey conducted by Circle Research in January 2014 among European marketers, 4 out of 5 respondents do not use Big Data (despite the fact that they, of course, "love" it). The reasons are different. There are few inveterate skeptics - 17% and exactly the same as their antipodes, i.e. those who confidently answer “Yes”. The rest are hesitant and doubtful, "swamp". They avoid direct answers under plausible pretexts like "not yet, but soon" or "let's wait until the others start."

Big Data use by marketers, Europe, January 2014


A source:dnx, published -emarketer.com

What confuses them? Sheer trifles. Some (exactly half of them) simply do not believe this data. Others (there are also a lot of them - 55%) find it difficult to relate the sets of "data" and "users" to each other. Some people just (let's put it politically correct) have an internal corporate mess: data is wandering around between marketing departments and IT structures. For others, software cannot cope with the influx of work. Etc. Since the total shares significantly exceed 100%, it is clear that the situation of "multiple barriers" occurs quite often.

Barriers hindering the use of Big Data in marketing


A source:dnx, published -emarketer.com

Thus, we have to admit that while “Big Data” is a great potential that still needs to be used. By the way, this may be the reason why Big Data is losing the halo of a "fashion trend", as evidenced by the data of a survey conducted by the already mentioned company Econsultancy.

The most significant trends in digital marketing 2013-2014


A source: Econsultancy and Adobe

They are being replaced by another king - content marketing. How long?

This is not to say that Big Data is some kind of fundamentally new phenomenon. Big data sources have been around for years: databases on customer purchases, credit histories, lifestyle. And for years, scientists have used this data to help companies assess risk and predict future customer needs. However, today the situation has changed in two aspects:

More sophisticated tools and techniques have emerged for analyzing and combining different datasets;

These analytical tools are complemented by an avalanche of new data sources brought about by the digitalization of virtually all data collection and measurement methods.

The range of information available is both inspiring and intimidating for researchers raised in a structured research environment. Consumer sentiment is captured by websites and all kinds of social media. The fact of viewing ads is recorded not only by set-top boxes, but also using digital tags and mobile devices that communicate with the TV.

Behavioral data (such as call counts, shopping habits and purchases) is now available in real time. Thus, much of what was previously available through research can now be learned through big data sources. And all these information assets are generated constantly, regardless of any research processes. These changes make us wonder if big data can replace classic market research.

It's not about the data, it's about the questions and answers

Before ordering the death knell of classic studies, we must remind ourselves that it is not the presence of any particular data asset that is critical, but something else. What exactly? Our ability to answer questions is what. One funny thing about the new world of big data is that results from new information assets lead to even more questions, and these questions are usually best answered by traditional research. Thus, as big data grows, we see a parallel increase in the availability and demand for “small data” that can provide answers to questions from the world of big data.

Consider a situation: a large advertiser constantly monitors store traffic and sales in real time. Existing research techniques (in which we interview panelists about their buying motivations and POS behavior) help us better target specific customer segments. These techniques can be expanded to include a wider range of big data assets, to the point where big data becomes a passive surveillance tool, and research is a method of constantly focused research on changes or events that require study. This is how big data can free up the hassle of research. Primary research should no longer focus on what's going on (big data will). Instead, primary research may focus on explaining why we see trends or deviations from trends. The researcher will be able to think less about obtaining data and more about how to analyze and use it.

At the same time, we see that big data is solving one of our biggest problems - the problem of excessively long research. Examining the studies themselves has shown that overly inflated research tools have a negative impact on data quality. Although many experts have long acknowledged this problem, they invariably responded by saying, “But I need this information for senior management,” and long polls continued.

In the world of big data, where quantitative indicators can be obtained through passive observation, this issue becomes controversial. Again, let's take a look at all of these studies related to consumption. If big data gives us insights about consumption through passive observation, then primary research in the form of surveys no longer needs to collect this kind of information, and we can finally support our vision of short surveys not only with good wishes, but also with something real.

Big Data needs your help

Finally, “big” is just one of the characteristics of big data. The characteristic “large” refers to the size and scale of the data. Of course, this is the main characteristic, since the amount of this data goes beyond everything that we have worked with before. But other characteristics of these new data streams are also important: they are often poorly formatted, unstructured (or, at best, partially structured) and full of uncertainty. The emerging field of data management, aptly called entity analytics, is designed to solve the problem of overcoming noise in big data. Its task is to analyze these datasets and find out how many observations belong to the same person, which observations are current, and which ones are usable.

This type of data cleansing is necessary to remove noise or erroneous data when working with large or small data assets, but it is not enough. We also need to create context around big data assets based on our previous experience, analytics, and category knowledge. In fact, many analysts point to the ability to manage the uncertainty inherent in big data as a source of competitive advantage, as it enables better decision making.

And this is where primary research not only finds itself freed from routine by big data, but also contributes to content creation and analysis within the framework of big data.

A prime example of this is the application of our new fundamentally different brand equity framework to social media. (we are talking about developed inMillward Browna new approach to measuring brand valueThe Meaningfully Different Framework- "The Paradigm of Significant Differences" -R & T ). This model has been tested for behavior in specific markets, implemented on a standard basis, and is easy to apply in other marketing directions and information systems for decision support. In other words, our survey-driven brand equity model (albeit not limited to survey) has all the properties needed to overcome the unstructured, disjointed, and uncertain nature of big data.

Consider the data on consumer sentiment provided by social media. In its raw form, peaks and valleys in consumer sentiment are very often minimally correlated with offline brand equity and behavior metrics: there is simply too much noise in the data. But we can reduce that noise by applying our models of consumer sense, brand differentiation, dynamics, and differentiation to raw consumer sentiment data - a way of processing and aggregating social media data across these dimensions.

Once the data is organized according to our framework model, the trends identified usually match up with offline brand equity and behavior metrics. Essentially, social media data can't speak for itself. Using them for this purpose requires our experience and models built around brands. When social media provides us with unique information expressed in the language consumers use to describe brands, we must use that language in our research to make primary research much more effective.

Benefits of Exempt Research

This brings us back to the fact that big data does not replace research so much as it frees it up. Researchers will be relieved of the need to create a new study for each new case. The ever-growing big data assets can be leveraged across multiple research topics, allowing subsequent primary research to delve deeper into the topic and fill in the gaps. Researchers will be relieved of the need to rely on overblown polls. Instead, they will be able to use short surveys and focus on the most important parameters, which improves the quality of the data.

With this release, researchers will be able to use their proven principles and ideas to add precision and meaning to big data assets, leading to new areas for survey research. This cycle should lead to deeper understanding on a range of strategic issues and, ultimately, towards what should always be our main goal - to inform and improve the quality of decisions regarding brand and communications.

Foreword

“Big data” is a fashionable term that appears at almost all professional conferences devoted to data analysis, predictive analytics, data mining, CRM. The term is used in areas where work with qualitatively large amounts of data is relevant, where the speed of data flow into the organizational process is constantly increasing: economics, banking, manufacturing, marketing, telecommunications, web analytics, medicine, etc.

Along with the rapid accumulation of information, technologies for data analysis are also rapidly developing. If a few years ago it was possible, say, only to segment customers into groups with similar preferences, now it is possible to build models for each customer in real time, analyzing, for example, his movement on the Internet to search for a specific product. The interests of the consumer can be analyzed, and in accordance with the built model, a suitable advertisement or specific offers are displayed. The model can also be tuned and rebuilt in real time, which was unthinkable a few years ago.

In the field of telecommunications, for example, technologies have been developed to determine the physical location of cell phones and their owners, and it seems that the idea described in the science fiction film Minority Report, 2002, which displays advertising information in shopping malls, will soon become a reality took into account the interests of specific persons passing by.

At the same time, there are situations where the passion for new technologies can lead to disappointment. For example, sometimes sparse data ( Sparse data), which provide an important understanding of reality, are much more valuable than Big data(Big Data) describing mountains, often of non-essential information.

The purpose of this article is to clarify and reflect on the new possibilities of Big Data and illustrate how an analytical platform STATISTICA by StatSoft can help in the effective use of Big Data to optimize processes and problem solving.

How big is Big Data?

Of course, the correct answer to this question should be - "it depends ..."

In modern discussions, Big Data is described as volume data in orders of terabytes.

In practice (when it comes to gigabytes or terabytes), such data is easy to store and manage using "traditional" databases and standard hardware (database server).

Software STATISTICA uses multi-threaded technology for data access (reading) algorithms, transformation and construction of predictive (and scoring) models, so such data samples can be easily analyzed, and do not require specialized tools.

In some current StatSoft projects, samples of the order of 9-12 million rows are processed. We multiply them by 1000 parameters (variables) collected and organized in a data warehouse to build risk or predictive models. This kind of file will be “only” about 100 gigabytes in size. It is, of course, not a small data warehouse, but its size does not exceed the capabilities of standard database technology.

Product line STATISTICA for batch analysis and building scoring models ( STATISTICA Enterprise), real-time solutions ( STATISTICA Live Score), and analytical tools for creating and managing models ( STATISTICA Data Miner, Decisioning) easily scale to multiple servers with multi-core processors.

In practice, this means that a sufficient speed of analytical models (for example, forecasts regarding credit risk, the likelihood of fraud, reliability of equipment nodes, etc.), which allows making operational decisions, can almost always be achieved using standard tools. STATISTICA.

From Big Data to Big Data

Typically, the discussion on Big Data is centered around data warehouses (and doing analysis based on such warehouses) that are much larger than just a few terabytes.

In particular, some data stores can grow up to thousands of terabytes, i.e., up to petabytes (1000 terabytes = 1 petabyte).

Outside of petabytes, data accumulation can be measured in exabytes, for example, in the manufacturing sector around the world in 2010, it is estimated that a total of 2 exabytes of new information accumulated (Manyika et al., 2011).

There are industries where data is collected and accumulated very intensively.

For example, in a manufacturing environment such as power plants, a continuous stream of data is sometimes generated for tens of thousands of parameters every minute or even every second.

In addition, over the past few years, so-called “smart grid” technologies have been introduced, allowing utilities to measure the electricity consumption of individual households every minute or every second.

For this kind of applications, in which data must be stored for years, the accumulated data is classified as Extremely Big Data.

The number of Big Data applications among commercial and government sectors is also growing, where the amount of data in storage can be hundreds of terabytes or petabytes.

Modern technology makes it possible to "track" people and their behavior in various ways. For example, when we browse the Internet, shop at online stores or large chain stores such as Walmart (according to Wikipedia, Walmart's datastore is valued at more than 2 petabytes), or when we move around with mobile phones- we leave a trace of our actions, which leads to the accumulation of new information.

Different ways of communication, from simple phone calls to downloading information through social networking sites such as Facebook (according to Wikipedia, the exchange of information is 30 billion units each month), or video sharing on sites such as YouTube (Youtube claims that it downloads 24 hours of video every minute; see Wikipedia) generate massive amounts of new data every day.

Likewise, modern medical technology generates large amounts of data related to health care delivery (images, video, real-time monitoring).

So, the classification of data volumes can be represented as follows:

Large datasets: from 1000 megabytes (1 gigabyte) to hundreds of gigabytes

Huge datasets: from 1000 gigabytes (1 terabyte) to several terabytes

Big Data: from several terabytes to hundreds of terabytes

Extremely Big Data: 1,000 to 10,000 terabytes = 1 to 10 petabytes

Big Data Tasks

There are three types of tasks related to Big Data:

1. Storage and management

Hundreds of terabytes or petabytes of data make it difficult to store and manage with traditional relational databases.

2. Unstructured information

Most of all Big Data data is unstructured. Those. how can you organize text, video, images, etc.?

3. Analysis of Big Data

How to analyze unstructured information? How to create simple reports based on Big Data, build and implement in-depth predictive models?

Storage and management of Big Data

Big Data is usually stored and organized on distributed file systems.

In general terms, information is stored in several (sometimes thousands) hard drives, on standard computers.

A so-called "map" keeps track of where (on which computer and / or disk) a particular piece of information is stored.

To ensure fault tolerance and reliability, each piece of information is usually stored several times, for example, three times.

So, for example, suppose you have collected individual transactions from a large retail chain of stores. The details of each transaction will be stored on different servers and hard drives, and a map indexes exactly where the transaction information is stored.

With standard hardware and open source software tools to manage this distributed file system (for example, Hadoop), it is relatively easy to implement reliable petabyte-scale data stores.

Unstructured information

Most of the collected information in a distributed file system consists of unstructured data such as text, images, photographs, or videos.

This has its advantages and disadvantages.

The advantage is that the ability to store big data allows you to store “all the data” without worrying about how much of the data is relevant for later analysis and decision making.

The disadvantage is that in such cases, post-processing of these huge amounts of data is required to extract useful information.

While some of these operations may be simple (eg, simple calculations, etc.), others require more complex algorithms that must be specially designed to work efficiently on a distributed file system.

One senior executive once told StatSoft that he “spent a fortune on IT and data storage but still hasn’t started making money” because he hasn’t thought about how best to use that data to improve his core business.

So, while the amount of data can grow exponentially, the ability to extract information and act on that information is limited and will asymptotically reach a limit.

It is important that methods and procedures for building, updating models, and automating decision making are developed alongside storage systems to ensure that such systems are useful and beneficial to the enterprise.

Big data analysis

This is a really big problem with analyzing unstructured Big Data: how to analyze it profitably. Much less has been written about this issue than about data storage and Big Data management technologies.

There are a number of issues to consider.

Map-Reduce

When analyzing hundreds of terabytes or petabytes of data, it is not possible to extract the data to some other place for analysis (for example, in STATISTICA Enterprise Analysis Server).

The process of transferring data through channels to a separate server or servers (for parallel processing) will take too long and requires too much traffic.

Instead, analytical calculations must be performed physically close to where the data is stored.

The Map-Reduce algorithm is a model for distributed computing. The principle of its operation is as follows: the input data is distributed to working nodes (individual nodes) of the distributed file system for preprocessing (map-step) and, then, the convolution (merging) of already preprocessed data (reduce-step).

So, say, to compute the grand total, the algorithm will compute subtotals in parallel at each of the nodes of the distributed file system, and then add those subtotals.

There is a wealth of information on the Internet about how you can perform various calculations using the map-reduce model, including for predictive analytics.

Simple Statistics, Business Intelligence (BI)

For simple BI reports, there are many open source products that allow you to calculate sums, averages, proportions, and more. using map-reduce.

Thus, it is very easy to get accurate counts and other simple statistics for reporting.

Predictive modeling, advanced statistics

At first glance, it might seem that building predictive models in a distributed file system is more difficult, but this is not at all the case. Let's consider the preliminary stages of data analysis.

Data preparation. Some time ago StatSoft ran a series of large and successful projects involving very large datasets describing the minute-by-minute metrics of the plant's operation. The purpose of this analysis was to improve plant efficiency and reduce emissions (Electric Power Research Institute, 2009).

It is important that although datasets can be very large, the information they contain is much smaller.

For example, while data is accumulated every second or every minute, many parameters (temperature of gases and furnaces, flows, position of dampers, etc.) remain stable over long intervals of time. In other words, the data recorded every second is basically a repetition of the same information.

Thus, it is necessary to carry out “smart” data aggregation, obtaining data for modeling and optimization, which contains only necessary information about dynamic changes affecting the efficiency of the power plant and the amount of emissions.

Classification of texts and preprocessing of data. Let us illustrate again how large datasets can contain much less useful information.

For example, StatSoft has been involved in text mining projects from tweets reflecting passengers' satisfaction with airlines and their services.

Despite the fact that a large number of relevant tweets were retrieved hourly and daily, the sentiment expressed in them was quite simple and monotonous. Most of the messages are complaints and short messages from one sentence of “bad experience”. In addition, the number and “strength” of these sentiments are relatively stable over time and in specific issues (eg, lost luggage, poor food, flight cancellations).

Thus, reducing actual tweets to a score of sentiment using text mining techniques (such as those implemented in STATISTICA Text Miner), results in much less data, which can then be easily correlated with existing structured data (actual ticket sales, or frequent flyer information). The analysis allows you to divide clients into groups and examine their typical complaints.

There are many tools for doing this kind of data aggregation (eg, sentiment rate) on a distributed file system, making this analytic process easy.

Building models

Often the challenge is to quickly build accurate models for data stored on a distributed file system.

There are map-reduce implementations for various data mining / predictive analytics algorithms suitable for large-scale parallel processing of data in a distributed file system (which can be supported by the platform STATISTICА StatSoft).

However, precisely because you have processed a very large amount of data, are you confident that the final model is indeed more accurate?

In fact, it is likely more convenient to build models for small data segments in a distributed file system.

As a recent Forrester report says, “Two plus two equals 3.9 is usually good enough” (Hopkins & Evelson, 2011).

The statistical and mathematical accuracy lies in the fact that a linear regression model, including, for example, 10 predictors based on a correctly made probabilistic sample out of 100,000 observations will be as accurate as a model built with 100 million observations.

It was predicted that the total global volume of created and replicated data in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

More complex definition

However, ` big data`involve more than just analyzing vast amounts of information. The problem is not that organizations create huge amounts of data, but that most of it is presented in a format that does not correspond well to the traditional structured database format, such as weblogs, videos, text documents, machine code, or, for example, geospatial data. ... All of this is stored in many different repositories, sometimes even outside the organization. As a result, corporations may have access to a huge amount of their data and lack the necessary tools to establish relationships between that data and draw meaningful conclusions from it. Add to this the fact that data is now being updated more and more often, and you get a situation in which traditional methods of information analysis cannot keep up with huge volumes of constantly updated data, which ultimately opens the way for technology. big data.

Best definition

In essence, the concept big data implies working with information of a huge volume and diverse composition, very often updated and located in different sources in order to increase work efficiency, create new products and increase competitiveness. Consulting company Forrester summarizes: ` Big data combine techniques and technologies that make sense of data at the extreme limit of usability. '

How big is the difference between business intelligence and big data?

Craig Batey, Chief Marketing Officer and Chief Technology Officer, Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business over a period of time, while processing speed big data allows you to make the analysis predictive, capable of offering business recommendations for the future. Big data also allows you to analyze more types of data compared to business intelligence tools, which makes it possible to focus on more than just structured storage.

Matt Slocum of O "Reilly Radar believes that although big data and business intelligence have the same goal (finding answers to a question), they differ from each other in three aspects.

  • Big data is designed to handle more information than business intelligence, and this is, of course, in line with the traditional definition of big data.
  • Big data is designed to process information that is received and changing more quickly, which means deep exploration and interactivity. In some cases, the results are generated faster than the web page loads.
  • Big data is designed to handle unstructured data, the ways of using which we are only beginning to learn after we have been able to establish its collection and storage, and we need algorithms and the ability to dialogue to facilitate the search for trends contained within these arrays.

According to the Oracle Information Architecture: An Architect's Guide to Big Data white paper published by Oracle, we approach information differently when working with big data than when doing business analysis.

Working with big data is not like the usual business intelligence process, where the simple addition of known values ​​produces a result: for example, the sum of the data on paid invoices becomes the sales volume for the year. When working with big data, the result is obtained in the process of cleaning it through sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model is built, on the basis of which the correctness of the put forward hypothesis is checked, and then the next one is put forward. This process requires the researcher to either interpret visual values ​​or compose interactive queries based on knowledge, or develop adaptive machine learning algorithms capable of obtaining the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big data analysis techniques

There are many different methods for analyzing data sets, which are based on tools borrowed from statistics and informatics (for example, machine learning). The list does not claim to be complete, but it reflects the most popular approaches in various industries. At the same time, it should be understood that researchers continue to work on creating new techniques and improving existing ones. In addition, some of the techniques listed above are not necessarily applicable exclusively to big data and can be successfully used for smaller arrays (for example, A / B testing, regression analysis). Of course, the more voluminous and diversified the array is analyzed, the more accurate and relevant data can be obtained at the output.

A / B testing... A technique in which a control sample is compared one by one with others. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing proposal. Big data allow you to carry out a huge number of iterations and thus obtain a statistically reliable result.

Association rule learning... A set of techniques for identifying relationships, i.e. association rules, between variables in large data sets. Used in data mining.

Classification... A set of techniques that allows you to predict consumer behavior in a particular market segment (making decisions about purchasing, outflow, consumption, etc.). Used in data mining.

Cluster analysis... A statistical method for classifying objects into groups by identifying previously unknown common features. Used in data mining.

Crowdsourcing... Methodology for collecting data from a large number of sources.

Data fusion and data integration... A set of techniques that allows you to analyze comments from users of social networks and compare them with sales results in real time.

Data mining... A set of methods that allows you to determine the categories of consumers most receptive for the product or service being promoted, to identify the characteristics of the most successful employees, and to predict the behavioral model of consumers.

Ensemble learning... This method uses a variety of predictive models, thereby improving the quality of the predictions.

Genetic algorithms... In this technique, possible solutions are presented in the form of 'chromosomes', which can combine and mutate. As in the process of natural evolution, the fittest survives.

Machine learning... The direction in informatics (historically, the name `artificial intelligence` was assigned to it), which aims to create self-learning algorithms based on the analysis of empirical data.

Natural language processing (NLP). A set of techniques for recognizing the natural language of a person borrowed from computer science and linguistics.

Network analysis... A set of techniques for analyzing connections between nodes in networks. Applied to social networks, it allows you to analyze the relationship between individual users, companies, communities, etc.

Optimization... A set of numerical methods for redesigning complex systems and processes to improve one or more metrics. Assists in making strategic decisions, for example, the composition of the product line introduced to the market, conducting investment analysis, etc.

Pattern recognition... A set of techniques with self-learning elements for predicting consumer behavior patterns.

Predictive modeling... A set of techniques that allow you to create a mathematical model of a predetermined probable scenario for the development of events. For example, analyzing the database of a CRM system for possible conditions that will push subscribers to change their provider.

Regression... A set of statistical methods for identifying patterns between a change in a dependent variable and one or more independent ones. It is often used for forecasting and predictions. Used in data mining.

Sentiment analysis... The methods for assessing consumer sentiment are based on technologies for recognizing the natural language of a person. They allow you to isolate from the general information flow messages related to the subject of interest (for example, a consumer product). Next, assess the polarity of the judgment (positive or negative), the degree of emotionality, and so on.

Signal processing... A set of techniques borrowed from radio engineering, which pursues the goal of recognizing a signal against a background of noise and its further analysis.

Spatial analysis... A set of methods for analyzing spatial data, partly borrowed from statistics - topology of terrain, geographic coordinates, geometry of objects. Source big data in this case, geographic information systems (GIS) are often used.

  • Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest on this list is Apache Hadoop, an open source software that has been tried and tested as a data analyzer by most stock trackers over the past five years. As soon as Yahoo opened the Hadoop code to the open source community, a whole new Hadoop product line immediately emerged in the IT industry. Almost all modern analysis tools big data provide tools for integrating with Hadoop. Their developers are both startups and well-known global companies.

Big data management markets

Big data platforms (BDP, Big Data Platform) as a means of combating digital chordings

The ability to analyze big data, colloquially called Big Data, is perceived as a blessing, and unambiguously. But is it really so? What can the rampant accumulation of data lead to? Most likely to what domestic psychologists refer to as a person's pathological hoarding, syllogomania, or figuratively "Plyushkin's syndrome." In English, the vicious passion to collect everything is called hording (from the English hoard - "stock"). According to the classification of mental diseases, Hording is classified as a mental disorder. In the digital era, digital (Digital Hoarding) is added to the traditional material chording, both individuals and entire enterprises and organizations can suffer from it ().

World and Russian market

Big data Landscape - Major Suppliers

Interest in collection, processing, management and analysis tools big data showed almost all leading IT companies, which is quite natural. Firstly, they directly face this phenomenon in their own business, and secondly, big data open up excellent opportunities for developing new market niches and attracting new customers.

Many startups have appeared on the market that do business on processing huge amounts of data. Some of them use off-the-shelf cloud infrastructure provided by major players like Amazon.

Theory and Practice of Big Data in Industries

The history of development

2017

TmaxSoft forecast: the next "wave" of Big Data will require modernization of the DBMS

Businesses know that the vast amounts of data they have accumulated contain important information about their business and customers. If a company can successfully apply this information, then it will have a significant advantage over the competition, and it will be able to offer better products and services than theirs. However, many organizations are still unable to effectively use big data due to the fact that their legacy IT infrastructure is unable to provide the necessary storage capacity, data exchange processes, utilities and applications necessary to process and analyze large amounts of unstructured data to extract valuable information from them, indicated in TmaxSoft.

In addition, the increased processing power required to analyze ever-increasing amounts of data can require significant investment in an organization's legacy IT infrastructure, as well as additional maintenance resources that could be used to develop new applications and services.

On February 5, 2015, the White House released a report that discussed how companies are using “ big data"To set different prices for different buyers - a practice known as" price discrimination "or" differentiated pricing "(personalized pricing). The report describes the benefits of “big data” for both sellers and buyers, and its authors conclude that many of the problematic issues that have arisen in connection with the emergence of big data and differential pricing can be resolved within the framework of existing anti-discrimination laws and laws. protecting consumer rights.

At this time, the report notes that there is little evidence of how companies are using big data in the context of personalized marketing and differentiated pricing. This information shows that sellers use pricing methods that can be divided into three categories:

  • study of the demand curve;
  • Steering and differentiated pricing based on demographic data; and
  • behavioral targeting and individualized pricing.

Examining the demand curve: Marketers often experiment with demand and consumer behavior by randomly assigning customers to one of two possible price tiers. "Technically, these experiments are a form of differential pricing because they result in different prices for customers, even if they are 'non-discriminatory' in the sense that all customers are equally likely to 'hit' a higher price."

Steering: It is the practice of presenting products to consumers based on their demographic group. For example, a computer company website may offer the same laptop to different types of buyers at different prices based on the information they provide about themselves (for example, depending on whether the user is a representative of government agencies, scientific or commercial institutions, or an individual) or from their geographic location (for example, determined by the IP address of a computer).

Targeted behavioral marketing and personalized pricing: In these cases, buyers' personal data is used for targeted advertising and personalized pricing of certain products. For example, online advertisers use data collected by ad networks and through third party cookies to target users on the Internet in order to send targeted advertisements. This approach, on the one hand, enables consumers to receive advertisements of goods and services of interest to them.However, it may cause concern for those consumers who do not want certain types of their personal data (such as information about visits to websites connected with medical and financial issues) met without their consent.

While targeted behavioral marketing is widespread, there is relatively little evidence of personalized pricing in the online environment. The report suggests that this may be due to the fact that appropriate methods are still being developed, or the fact that companies are in no hurry to use individual pricing (or prefer to keep quiet about it) - perhaps for fear of negative reaction from consumers.

The authors of the report believe that "for the individual consumer, the use of big data is undoubtedly associated with both potential returns and risks." While recognizing that there are transparency and discrimination issues in the use of big data, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also emphasizes the need for “ongoing monitoring” when companies use confidential information in an opaque way or in ways that are not covered by the existing regulatory framework.

This report is an extension of the White House's efforts to examine the use of big data and discriminatory pricing on the Internet, and their implications for American consumers. Earlier it was reported that the White House working group on big data published its report on this issue in May 2014. The Federal Trade Commission (FTC) also addressed these issues during its September 2014 seminar on discrimination in relation to the use of big data.

2014

Gartner dispels Big Data myths

Gartner's Fall 2014 Policy Brief lists a number of common myths about Big Data among CIOs and refutes them.

  • Everyone is implementing Big Data processing systems faster than us

Interest in Big Data technologies is at a record high: 73% of organizations surveyed by Gartner analysts this year are already investing in related projects or are going to. But most of these initiatives are still in their early stages, and only 13% of those surveyed have already implemented such solutions. The hardest part is figuring out how to generate income from Big Data, deciding where to start. Many organizations get stuck in the pilot phase because they cannot tie new technology to specific business processes.

  • We have so much data that there is no need to worry about small errors in it.

Some CIOs believe that small data gaps do not affect the overall results of large volumes of analysis. When there is a lot of data, each individual error really affects the result less, analysts say, but the errors themselves become more numerous. In addition, most of the analyzed data is external, of unknown structure or origin, so the probability of errors increases. Thus, in the world of Big Data, quality is actually much more important.

  • Big data technologies will eliminate the need for data integration

Big Data promises the ability to process data in its original format with automatic schema generation as it is read. It is believed that this will allow the analysis of information from the same sources using multiple data models. Many believe that this will also provide an opportunity end users interpret any dataset as you see fit. In reality, most users often need a traditional schema-based approach where the data is formatted appropriately and there are agreements on the level of integrity of the information and how it should be related to the use case.

  • There is no point in using data warehouses for complex analytics

Many information management system administrators believe that there is no point in wasting time creating a data warehouse, given that complex analytical systems use new types of data. In fact, many complex analytics systems use information from a data warehouse. In other cases, new data types need to be additionally prepared for analysis in Big Data processing systems; you have to make decisions about the suitability of the data, the principles of aggregation and the required level of quality - such preparation can take place outside the warehouse.

  • Data lakes will replace data warehouses

In reality, vendors are misleading customers by positioning data lakes as storage replacements or as critical analytical infrastructure. The underlying data lake technologies lack the maturity and breadth of functionality inherent in storage. Therefore, data management leaders should wait until the lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are happy with the result

Among the main benefits of big data, the respondents named:

  • “Search for new sources of income” (56%),
  • "Improving the customer experience" (51%),
  • "New products and services" (50%) and
  • “The influx of new customers and the retention of the loyalty of old ones” (47%).

Many companies have faced traditional challenges when introducing new technologies. For 51%, security became a stumbling block, for 47% - budget, for 41% - lack of necessary personnel, and for 35% - difficulties in integrating with the existing system. Almost all surveyed companies (about 91%) plan to soon solve the problem with the lack of personnel and hire big data specialists.

Companies are optimistic about the future of big data technologies. 89% believe they will change the business as much as the internet. 79% of respondents noted that companies that do not do big data will lose their competitive edge.

However, the respondents disagreed about what exactly should be considered big data. 65% of respondents believe it is “big data files”, 60% believe it is “advanced analytics and analysis”, and 50% believe that it is “data from visualization tools”.

Madrid spends € 14.7 million on big data management

In July 2014, it became known that Madrid would use big data technologies to manage urban infrastructure. The project cost - 14.7 million euros, the basis of the implemented solutions will be technologies for the analysis and management of big data. With their help, the city administration will manage the work with each service provider and pay accordingly, depending on the level of services.

We are talking about the contractors of the administration who monitor the condition of streets, lighting, irrigation, green spaces, clean up the territory and remove, as well as recycle waste. During the project, 300 key performance indicators of city services were developed for specially designated inspectors, on the basis of which 1.5 thousand various checks and measurements will be carried out daily. In addition, the city will start using an innovative technology platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: Big Data Peak Fashion

Without exception, all vendors in the data management market are developing technologies for Big Data management at this time. This new technological trend is also actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, there was a wave of discussion around “ big data"Has exceeded all conceivable dimensions. After analyzing the number of mentions of Big Data in social networks, Datashift calculated that in 2012 this term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3070 mentions per hour.

Gartner: Every second CIO is ready to spend on Big data

After several years of experimenting with Big data technologies and the first implementations in 2013, the adaptation of such solutions will increase significantly, Gartner predicts. Researchers surveyed IT leaders around the world and found that 42% of respondents have already invested in Big data technologies or plan to make such investments within the next year (data as of March 2013).

Companies are forced to spend money on processing technologies big data since the information landscape is changing rapidly, I require new approaches to information processing. Many companies have already realized that big data is critical, and working with it allows you to achieve benefits that are not available using traditional sources of information and methods of processing it. In addition, the constant exaggeration of the topic of "big data" in the media is fueling interest in relevant technologies.

Frank Buytendijk, vice president of Gartner, even urged companies to moderate their fervor, as some are concerned that they are lagging behind competitors in Big Data acquisition.

“There is no need to worry, the possibilities for implementing ideas based on Big Data technologies are virtually endless,” he said.

Gartner predicts that by 2015, 20% of the Global 1000 companies will have a strategic focus on "information infrastructure."

In anticipation of new opportunities that Big Data processing technologies will bring with them, many organizations are already organizing the process of collecting and storing various kinds of information.

For educational and government organizations, as well as companies in the industry, the greatest potential for business transformation lies in the combination of accumulated data with so-called dark data (literally - "dark data"), the latter include messages Email, multimedia and other similar content. According to Gartner, the data race will be won by those who learn to handle the most different sources information.

Cisco Survey: Big Data Will Help Increase IT Budgets

In a Spring 2013 survey, the Cisco Connected World Technology Report, conducted in 18 countries by the independent analytics firm InsightExpress, 1,800 college students and a similar number of young professionals aged 18 to 30 were surveyed. The survey was conducted to find out the level of readiness of IT departments to implement projects Big Data and gain insight into the associated challenges, technology gaps and the strategic value of such projects.

Most companies collect, record and analyze data. Nonetheless, the report says, many companies face a range of complex business and information technology challenges in relation to Big Data. For example, 60 percent of those surveyed admit that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent said they already receive real strategic benefits from the accumulated information.

More than half of the IT executives surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased requirements for technology, personnel and professional skills. At the same time, more than half of respondents expect that such projects will increase IT budgets in their companies as early as 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects will require cloud computing. Thus, the spread cloud technologies can affect the speed of distribution of Big Data solutions and the value of these solutions for the business.

Companies collect and use data from the most different types, both structured and unstructured. Here are the sources from which survey participants get their data (Cisco Connected World Technology Report):

Nearly half (48 percent) of CIOs predict that the load on their networks will double over the next two years. (This is especially true in China, where 68 percent of those surveyed hold this view, and Germany, 60 percent.) 23 percent of respondents expect network load to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for an explosive growth in the volume of network traffic.

27 percent of those surveyed admitted that they need better IT policies and information security measures.

21 percent needs more bandwidth.

Big Data opens up new opportunities for IT departments to build value and build strong relationships with business units, allowing them to increase revenues and strengthen the company's financial position. Big Data projects make IT departments a strategic partner to business units.

According to 73 percent of respondents, it is the IT department that will become the main driving force behind the Big Data strategy. At the same time, the respondents believe that other departments will also be involved in the implementation of this strategy. First of all, this concerns departments of finance (it was named by 24 percent of respondents), research and development (20 percent), operations (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Millions of New Jobs Needed to Manage Big Data

World IT spending will reach $ 3.7 billion by 2013, which is 3.8% more than spending on information technology in 2012 (forecast for the end of the year is $ 3.6 billion). Segment big data(big data) will grow at a much faster pace, according to a Gartner report.

By 2015, 4.4 million jobs in the information technologies will be created to serve big data, of which 1.9 million jobs are in. Moreover, each such job will entail the creation of three additional jobs outside the IT sector, so that in the United States alone in the next four years, 6 million people will work to support the information economy.

According to Gartner experts, the main problem is that there is not enough talent in the industry for this: both the private and public education systems, for example, in the United States, are not able to supply the industry with a sufficient number of qualified personnel. So out of the mentioned new jobs in IT, only one of the three will be provided with personnel.

Analysts believe that the role of cultivating qualified IT personnel should be taken directly by companies that are in dire need of them, since such employees will become a gateway for them to the new information economy of the future.

2012

First skepticism about Big Data

Analysts at Ovum and Gartner suggest that for a trendy 2012 theme big data it may be time to release the illusion.

The term "Big Data" at this time usually refers to the ever-growing volume of information coming online from social media, from networks of sensors and other sources, as well as the growing range of tools used to process data and identify important business -trends.

“Because of the hype (or despite it) over the idea of ​​big data, manufacturers in 2012 looked at this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer said DataSift has conducted a retrospective analysis of big data mentions in

In the Russian-speaking environment is used as a term Big Data and the concept of "big data". The term "big data" is a copy of the English term. Big data is not strictly defined. Can't draw a clear line - is it 10 terabytes or 10 megabytes? The name itself is very subjective. The word "big" is like "one, two, many" in primitive tribes.

However, there is an established belief that big data is a combination of technologies that are designed to perform three operations. First, to process more data than “standard” scenarios. Secondly, to be able to work with rapidly arriving data in very large volumes. That is, there is not just a lot of data, but more and more of them are constantly growing. Third, they must be able to work with structured and poorly structured data in parallel in different aspects. Big data assumes that algorithms receive a stream of not always structured information as input and that more than one idea can be extracted from it.

A typical example of big data is information coming from various physical experimental facilities - for example, from, which produces a huge amount of data and does it all the time. The installation continuously produces large amounts of data, and scientists with their help solve many problems in parallel.

The emergence of big data in the public space was due to the fact that this data affected almost all people, and not just the scientific community, where such problems have been solved for a long time. Into the public sphere of technology Big Data came out when it came to a very specific number - the number of inhabitants of the planet. 7 billion gathering on social media and other projects that aggregate people. Youtube, Facebook, In contact with, where the number of people is measured in billions, and the number of transactions that they perform simultaneously is enormous. The data flow in this case is user actions. For example, data from the same hosting Youtube that spill over the network in both directions. Processing means not only interpretation, but also the ability to correctly process each of these actions, that is, to put it in the right place and make this data available to each user quickly, since social networks do not tolerate expectations.

Much of what concerns big data, the approaches that are used to analyze it, have actually been around for quite some time. For example, processing images from surveillance cameras, when we are talking not about one picture, but about a data stream. Or robot navigation. All this has been around for decades, it's just that now data processing tasks have affected a lot more people and ideas.

Many developers are used to working with static objects and thinking in terms of states. In big data, the paradigm is different. You have to be able to work with an incessant stream of data, and this is an interesting task. It touches more and more areas.

In our life, more and more hardware and software begin to generate large amounts of data - for example, the "Internet of Things".

Things are already generating huge streams of information. The Potok police system sends information from all cameras and allows you to find cars using this data. Fitness bracelets, GPS trackers and other things that serve the tasks of a person and a business are becoming more and more fashionable.

The Moscow Department of Informatization is recruiting a large number of data analysts, because there are a lot of statistics on people and they are multi-criteria (that is, statistics on a very large number of criteria have been collected about each person, about each group of people). It is necessary to find patterns and tendencies in these data. Such problems require mathematicians with an IT education. Because, ultimately, data is stored in structured DBMS, and you need to be able to access it and get information.

Previously, we did not consider big data as a task for the simple reason that there was no place to store it and there were no networks to transfer it. When these opportunities appeared, the data immediately filled the entire volume provided to them. But no matter how you expand the bandwidth and storage capacity, there will always be sources, for example, physical experiments, wing streamlining experiments that will produce more information than we can convey. According to Moore's law, the performance of modern parallel computing systems is steadily increasing, and the speed of data transmission networks is also growing. However, you need to be able to quickly save and retrieve data from media (hard disk and other types of memory), and this is another challenge in big data processing.


2021
maccase.ru - Android. Brands. Iron. news