Basics of data mining: technologies, methods and tasks

Using data is a problem in programming and developing information systems. Before you analyze a large amount of data and make a decision that guarantees a reliable and objective result, you must determine this large volume. The task is complicated if the flow of information is growing rapidly, and the time for making a decision is limited.

Data and its formalization

Modern information technologies guarantee safe and reliable analysis, presentation and processing of data. This is syntactically and formally true. From the point of view of the semantics of the task and the objectivity of the expected solution - the result depends on the experience, knowledge and skills of the programmer.

Programming languages ​​are in the status of a reliable and safe tool. The knowledge and skills of specialists to analyze, represent and process data have reached a level of relative universality.

Data mining technologies at this level are almost flawless. The data type can be known at the time of the operation on them, and in case of discrepancy, it will be automatically converted to the desired type.

data mining




Hypertext tools are developed, distributed processing of large volumes of data is widely used. At this level:





  • informational tasks can be formalized;
  • mining needs are being met;
  • The quality of the result depends on the quality of knowledge and professionalism of the programmer.

The situation in the programming of information systems at the enterprise level is characterized by the presence of really working products that ensure the formation of large volumes of data and a higher-order problem.

Large amounts of data

In the 80s, when databases became database management systems, improving the reliability of hardware and the quality of programming languages ​​left much to be desired.

Currently, a large number of databases have accumulated, many sources of information are computerized, sophisticated systems for collecting various information (finances, weather, statistics, taxes, real estate, personal data, climate, politics ...) have been developed.

Some data sources are characterized by obvious patterns and can be analyzed by mathematical methods. You can perform data mining in Excel: clear data, build a model, form a hypothesis, determine correlations, etc.

In some data and sources, patterns are difficult to detect. In all cases, software and hardware for data processing is characterized by reliability and stability. The task of data mining has become at the forefront in many socio-economic fields.

Large amounts of information




Leaders of the information industry, in particular Oracle, are focusing their attention on a range of circumstances characterizing data of a new type:









  • huge streams;
  • natural information (even if it was created programmatically);
  • heterogeneous data;
  • highest criteria of responsibility;
  • a wide range of data presentation formats;
  • compatibility of data integrators and their processors.

The main feature of the new type of data: the huge volume and rate of increase of this volume. Classical algorithms are not applicable for processing data of a new type, even taking into account the speed of modern computers and the use of parallel technologies.

From backup to migration and integration

Previously, the task of secure information storage (backup, backup) was relevant. Today, the problem of migration of multiple data representations (different formats and encodings) and their integration into a single whole is relevant.

From backup to integration




Without data mining technology, many tasks cannot be solved. This is not about making decisions, determining dependencies, creating algorithms for sampling data volumes for subsequent processing. The merging of heterogeneous data has become a problem, and it is not possible to bring sources of information to a single formalized basis.

Mining large amounts of data requires the determination of this volume and the creation of technology (algorithm, heuristics, rule sets) in order to be able to pose a problem and solve it.

Data mining: what to dig

The concept of data analysis in the context of intelligent methods began to develop actively from the beginning of the 90s of the last century. Artificial intelligence by this time did not live up to expectations, but the need for informed decisions based on information analysis began to grow rapidly.

Machine learning, data mining, pattern recognition, visualization, database theory, algorithmization, statistics, mathematical methods have compiled a range of tasks for a new, actively developing field of knowledge, which is associated with the English-language data minig.

In practice, the new field of knowledge has acquired an interdisciplinary nature and is in its infancy. Thanks to the experience and software products from Oracle, Microsoft, IBM and other leading companies, there is a clear idea of ​​what data mining is, but there are still many questions. Suffice it to say that the line of software products from Oracle, devoted to extremely large volumes of information, their integration, compatibility, migration and processing - is more than forty items!

What is needed to set the task of processing big data correctly and get an informed decision? Scientists and practitioners agree on a generalized understanding of the phrase "search for hidden patterns." Three positions are combined here:

  • non-obviousness;
  • objectivity;
  • practical utility.

The first position means that the usual methods do not determine what needs to be found and how to do it. Classic programming is not applicable here. We need, if not artificial intelligence, then at least programs for data mining. The term “intellectual” is no less a problem than the task of determining a sufficient amount of data for making initial decisions and formulating the initial rules of work.

Data mining task




Objectivity is a kind of guarantee that the selected technology, the developed “intelligent” technique or the range of “smart” rules will give reason to consider the results obtained as correct not only for the author, but also for any other specialist.

Oracle in its software products adds to the concept of objectivity the status of safe, devoid of extraneous negative interference.

Practical utility is the most important criterion for the result and algorithm for solving the data mining problem in a particular application.

Data mining: where to dig

Business Intelligence (BI) is the foundation of modern, most expensive and sought after software. Business solution providers believe that they have found a way to solve the problems of processing large amounts of data, and their software products can ensure the safe and rapid development of the business of a company of any size.

As in the case of artificial intelligence in the field of data mining tools, current achievements should not be exaggerated too much. Everything just gets on its feet, but real results cannot be denied either.

The issue of scope. Algorithms for data mining in the economy, in production, in the field of climate information, about exchange rates are developed. There are intellectual products to protect the enterprise from the negative influence of laid-off employees (the field of psychology and sociology is a strong topic), from virus attacks.

Many developments actually perform the functions declared by their manufacturers. In fact, the task - what to do and where to do it - has acquired a meaningful and objective context:

  • the smallest possible scope;
  • the most accurate and clear goal;
  • data sources and data reduced to one base.

Only the scope and expected practical usefulness can help formulate the technologies, techniques, rules and foundations of data mining in a particular area, for a specific purpose.

Information technology has made a request for a scientific discipline, and one should not shun small steps in a new, unknown direction. Having looked upon the holy of holies - a natural intellect, a person cannot demand from himself that which he is not able to do.

Fuzzy statement of the problem




Deciding what to do and where to do it is extremely difficult today. In a particular business, in a specific area of ​​human activity, you can outline the amount of information to be investigated and get a solution that will be characterized by some degree of reliability and an indicator of objectivity.

Data mining: how to dig

Professional programming and our own highly qualified staff is the only tool to achieve the desired.

Example 1. The task of data mining will not be solved by the pure use of Oracle Load Testing Controller. This product is claimed to be a full-featured and extensible load testing tool. This is an extremely narrow task. Only the load! Nothing more, no highly intelligent tasks.

However, the tasks for which this product is used can confuse not only the tester, but also the developer, with all his regalia, the industry leader. In particular, testing is a requirement of functional completeness. Where is the guarantee that the Oracle Load Testing Controller is “in the know”, what data sets can be input to the application under test, server, and hardware-software complex.

Business intelligence




Example 2. Oracle Business Intelligence Suite Foundation Edition for Oracle Applications - the developer declares this product as a successful combination of the software used with the expertise in building, developing and supporting a large business.

Undoubtedly, the experience of Oracle is great, but this is not enough to transform it through a software-expert product. In a specific enterprise, in a specific region, Oracle Business Intelligence may not work from a tax decision or a local municipality order.

The smart use of modern technology

The only right decision in the field of large volumes of information, data mining and a data mining system in a company, government agency and in any socio-economic sphere is a team of specialists.

The knowledge and experience of qualified specialists is the only right decision that will give a comprehensive answer to the question:

  • data mining: what to dig, where to do it and how?

It will not be superfluous to purchase priority products of the appropriate purpose, but before you do this, you will need to study the scope, formulate an indicative solution and set a preliminary goal.

Only after the subject area is defined and the goal is approximately clear, you can start searching for solutions that have already been developed and tested by practice. Most likely, a product will be found that will clarify the subject area and purpose.

No program today can cope with the real task. Having lost in the field of artificial intelligence at the beginning of the 80s of the last century, a rational person cannot yet count on being able to write a program that solves intellectual problems.

The smart use of technology




You should not hope that the AI ​​will come by itself, and the program purchased from Oracle, Microsoft or IBM will tell you what to do, how and what result should be considered correct. The modern world of information technology is making rapid progress. You can take an effective part in it, strengthen the position of your business or solve a problem that was difficult to set. But you need to participate, and not rely on the program.

Programming is a static work, its result is a tough algorithm. A modern intellectual rule or heuristic is a hard-wired solution that will not work with the first opportunity.

Modeling and Testing

Big data mining is a really relevant and urgent task. But the scope before the discovery of this task is at the very least, but lived and developed.

The need for further business development poses new challenges that conceptually outline the volumes of big data to be processed. This is a natural process of scientific, technical and intellectual development of an enterprise, company, business. The same can be attributed to Internet technologies, to the tasks of parsing information on the Internet.

There are many new tasks and applications that are in demand, can be more or less clearly posed and characterized by an objective parameter: in solving them there is a demand for interest and there is an understanding of the likely usefulness.

Modeling is a fairly developed area, which is equipped with many proven mathematical methods. A model can always be built, it would be time and desire.

Modeling allows you to focus all available knowledge into one system and improve it on a set of test data cyclically. This is a classic development path that has also been tested by practice.

If you do not build castles in the air, but with stable confidence to go to the goal, then you can determine the path, and the desired solution, and the final goal.

Programming and smart methods

It was programming in the early 80s of the last century that pushed the public consciousness to the birth of ideas of artificial intelligence, it was it that became the ancestor of data mining, and it was with it that the methods of data mining began.

In those days, the problem of large amounts of data did not exist. Today, there is not only large amounts of data, but also the result of the development of database management systems - significant experience in relational relations, as the basis for the presentation of data.

Programming and smart methods




Relational relationships are a part, but not a whole. There is also the concept of systemicity, hierarchy, and much more that natural intelligence owns, but cannot realize artificial intelligence: in this case, in programming.

Programming is not intelligence in any sense, but it is the real result of applying intelligence in practice. This is its meaning, and it is precisely this that can be used to achieve the desired goals.

Active knowledge and skills

Any program is static. It represents the construction of an algorithm for solving a problem within the framework of the syntax of a programming language.

Modern programming languages ​​are the perfect result of the 80s, and this cannot be denied. It should also be noted that modern programming languages ​​make it possible to create free algorithms outside their syntax.

If someone ever can write a program that will not work by the will of its author, but by the will of the knowledge and skills acquired by it, the problem of large amounts of data and the adoption of intelligent decisions will be closed, and a new round of knowledge development will begin.




All Articles