What are the Big Data Techniques

Big data

The fact that the amount of data is growing and will continue to grow in the future is a truism and word has got around in every company by now. However, to equate mastery of the flood of data with big data alone does not go far enough. The topic has many different facets. This is exactly what makes it anything but banal and easy for user companies to tackle the big data phenomenon. The following aspects play together:

  1. In addition to the sheer amount of data, the number of data sources that companies need to keep an eye on is also growing. It is no longer just the classic transactional systems from which the data gushes into the company. Rather, it is also important today to correctly channel machine data or information from social networks.

  2. With the multitude of data sources, the variety of data also grows. In addition to the structured transaction data, which can be classically recorded in relational database systems, there is little or hardly any structured data such as texts, images and videos. In order to analyze, manage and process these data types in a meaningful way, new approaches must be taken.

  3. At the same time, data and information must be made accessible to more and more users. This not only affects the employees in your own company, but the entire value chain - from the supplier to the customer. So not only is the number of data sources growing, but also the number of data consumers.

  4. Different data sources, different data types and the ever-increasing distribution of information pose new challenges for data protection. In addition, the increasingly complex data infrastructures harbor the risk of errors and manipulation. Therefore, the importance of data integrity and data quality continues to grow.

But the complexity surrounding big data doesn't stop there. The range of offers and solutions is just as complex and opaque as the challenges caused by the flood of data. With the spread of the concept of big data, a confusing landscape of providers has developed, say the analysts of the Experton Group. Complex packages as well as individual modules appear on the market as big data solutions. There are also providers who combine existing third-party products with their own solutions. It is becoming increasingly difficult to keep an overview here.

  1. Experience in the use of big data techniques
    It's not that nobody has tackled big data projects yet. There are even some examples of companies that have successfully completed such projects.
  2. German wave
    “A clear task definition, a focus on the solution and the users of this solution (less on the latest information technology) and, last but not least, a feel for the usability and functionality of a reporting / analysis dashboard are also essential for big data projects. Less is mostly more here. "
  3. DeutschlandCard GmbH
    "Only a meticulous migration plan with at least one complete dress rehearsal including a fallback test ensures the operational security of such a complex application with its numerous interfaces to external partners."
  4. Schukat Electronic
    “Big data analytics is not just a challenge for large companies. Medium-sized companies too have to deal more and more with this topic in order to be successful in international competition. The application example illustrates the benefits in sales. But there are also diverse scenarios in the specialist departments in production with sensor data, etc., for example. "
  5. Otto shipping
    “We have recognized that our requirements require a self-learning system that takes into account constantly changing influencing factors such as address and article ranking or, in the print area, the number of pages and catalog output. As a result, our forecast quality is continuously increasing and the forecast sales volumes are becoming more and more precise. We can also prepare for future developments at an early stage. "
  6. Macy‘s
    “The business benefit only becomes apparent when processes that were deliberately restricted due to a lack of options are improved. In this case it is the much more frequent price optimization in the entire range that was previously impossible. Much more up-to-date sales figures can now also be included in the analysis. "
  7. Telecom Italia
    “Existing segmentation models can be expanded to include role-based models, in which the influence on the social environment by leaders, followers, etc. is made clear. Leaders are considered communication hubs and have a strong influence on decision-making in their environment. Marketing strategies and approaches to customer acquisition can be optimized by SNA. Characteristics of the communities, change between the communities and the identification of participants in interface areas enable conclusions to be drawn about new customer segments and target groups. "
  8. Netapp
    “The system based on Apache Hadoop works securely, reliably and with high performance. The Java-based platform uses open technologies and can therefore be flexibly expanded. Customers avoid a vendor lock-in with low operating costs (TCO). "
  9. Semikron GmbH
    “Big data projects are complex. Companies are often not in a position to estimate their actual data stocks for the planned project plans with regard to their volume development. At Semikron, for example, it has been shown that they assumed a much larger volume of data than was actually the case. During the proof of concept that was carried out, it turned out that the large amount of data that accumulates in typical production processes is very high, but not the data volume. "
  10. Vaillant Group
    “Simply converting the system landscape to innovative big data architectures from a technical IT perspective results in reliable business cases for reducing the TCO. For specialist departments, the results from the added value of the new solutions and options in connection with the drastic reduction in processing times by the user are clearly exceeded. "
  11. TomTom
    "In order to be able to meet the complete requirements of the customer in big data projects, comprehensive know-how is required, which includes the configuration of hardware and software, tuning and technical consulting."
  12. United Overseas Bank (Singapore)
    “Thinking in terms of business processes is decisive. If only a part is accelerated, but the entire process remains untouched, the advantage cannot be realized. Both the data management in advance and the real-time use of the real-time results are decisive factors for the successful use of this new solution. "
  13. Xing
    “Within a very short time, positive effects emerged on Xing, especially a significant improvement in the analyzes. The new solution enables processes to be developed more quickly and ad-hoc inquiries to be answered more quickly. Long workarounds are no longer necessary; all BI employees use the new system effectively. The complexity and maintenance of the system have been significantly reduced. When working with the new solution, a steep learning curve was recorded on the part of the users, and work is noticeably more productive. "
  14. On our own behalf:
    With these user quotes we want to whet your appetite for the next issue in our four-part Quadriga series. The cover story is big data. User examples, visionary concepts and opinions round off the topic. We will also return to the megatrends of mobility, cloud computing and social media. Release Date: June 10, 2013.

Data for wind turbines

From the analysts' point of view, the matter is also made more complicated by the fact that many providers based their communication on theoretical application examples. Concrete references are rare in this still young market. Where they exist, they are usually very specific and can hardly be transferred to other companies. One example is IBM's big data showcase project at the Danish wind turbine manufacturer Vestas, which examines up to 160 different factors and thus data in the petabyte range in order to choose the right location.

The same applies to SAP's "Oncolyzer", which is supposed to evaluate various medical data in the Berlin Charité on the basis of the in-memory database HANA in the shortest possible time and thus enable individual cancer therapy. In view of such individual cases, it remains difficult for other companies to find the right answer to their own big data problem.

The Big Five

The analysts have defined five different topics that users should keep an eye on in their search:

Merging of data from different sources, integration, data security, integrity and quality.
Visualization and display of results to many users, concepts such as linked open data.

The technical challenges start with the infrastructure. Three quarters of all IT decision-makers see a need for action to tackle their storage and database systems. In contrast, only half of the respondents identified an impact on the analyzes and reporting.

The DB market is rumbling

On the infrastructure side, the manufacturers of databases, among others, are challenged. For a long time the situation in this market seemed clear. The relational database management systems (RDBMS) were set in the user companies. The claims were divided among the three big providers Oracle, IBM and Microsoft. But for some time now there has been a rumble. In the course of big data, the classic systems reach their limits. Discussions are getting louder about what the future of databases might look like. Techniques like NoSQL, in-memory, and Hadoop are attracting more attention.

SQL or NoSQL

With the growing flood of poorly structured data that is difficult to fit into the grid of a relational database, the interest in NoSQL systems is growing. The abbreviation stands for "Not only SQL", so it is not primarily intended as a replacement for relational systems, but rather as a supplement. While conventional databases are based on tables and relations, different data models can be used in NoSQL databases. However, this also means that NoSQL is not always NoSQL. The different variants have strengths and weaknesses, so it is important to check carefully whether the individual application scenario fits the respective NoSQL DB.

Knot by knot

The architecture mostly relies on many interconnected standard servers. Scaling is done simply by adding more compute nodes. A prominent example of this is Hadoop. The framework essentially consists of two parts: The Hadoop Distributed File System (HDFS) distributes the data to the various nodes. There the data is processed with the help of the MapReduce algorithm developed by Google. The basic idea behind it: to break down arithmetic tasks into many small subtasks and distribute them in the cluster.

This parallelization and the fact that the data are processed at their storage location are intended to ensure that results are available much faster. Hadoop seems to be able to establish itself more and more in the database industry. Providers such as Cloudera and Intel build their own distributions of the open source stack by adding additional tools to the framework. In addition, the large database providers such as Oracle, IBM and Microsoft now offer connectors to link their systems with Hadoop.

Turbo in-memory

Other buzzwords that drive the database scene are in-memory computing and column-oriented databases - techniques that SAP combines in its HANA appliance. In-memory systems are characterized by the fact that the data is primarily stored in the main memory and can be processed there much faster. If the corresponding system is also structured in a column-oriented manner, data can be read more quickly. This makes these systems particularly suitable for analytical applications (Online Analytical Processing = OLAP). If, on the other hand, a lot of data has to be written to the database often, as in the environment of transactional systems (Online Transaction Processing = OLTP), line-oriented databases have an advantage.

All-in-one

Despite all the innovations, the proponents of the classic RDBMS do not believe in the end of their systems. The new techniques would sooner or later be assimilated. In addition, the established systems already have functions in their range that are comparable to those offered by Hadoop, for example. In addition, the old hands in the database business are currently pushing an appliance approach. With preconfigured systems consisting of hardware and software, customers should be offered a complete solution for data handling. Oracle offers its Exadata machines for this, while IBM has special database appliances in its portfolio as part of its Puresystems portfolio.

But now fast!