Connect with us

Hi, what are you looking for?

Tech

How Data Lake and Big Data Creates a Better Data Landscape

The need to operationalize data is not new, nor is the recognition that data is produced by different sources. The Data Lake can provide an advantage to an organization by allowing users to leverage any new data or analytic at minimal cost and risk.

What is a Data Lake and what does it mean to you?

Emc defines data lake by five major principles, these principles can be easily remembered with the acronym ISASA. ISASA stands for “Ingest Store Analyze Surface Act”, without the ability to support these functions there is no data lake strategy.

Let’s break down each of these principles to understand better.

Ingest 

This is the ability to collect all the data you care about, making sure your systems can correctly and frequently ingest that data through APIs or batch processes. This will increase the capabilities of your data lake.

Store

Store is getting all the data in one place  and breaking down silos is the first and the most important step it is also more functional if you can provide scalable storage and multi- protocol access to all that data. Some examples are, NFS, Sif’s, FTP and newer file systems like HDFS.

Analyze

Matching the correct data points can be a work of art having the correct systems and the correct talent is the key to finding the relations between all the data you’re gathering.

Surface

There needs to be a simple method to display all of the analysis, the data needs to be understood. The easier it is to see the results of the analysis the easier it is to take actions.

Act

This is explained simply by placing four M’s which means “Make Me More Money”, a plan has to be put into place to take the results of the data analysis and fit it into the operating business model.

Let’s take an example of a real world scenario, we’ll study a Casino and see how Data lake can benefit their organization. A data lake is useless unless you understand the desired results.

  • We need to determine business
  • Collect the appropriate data to help obtain the business objective
  • Identify what success looks like

The Casinos business objective is to improve their customer experience, the data lake will help them target the correct customer and success will be measured by increase in customer visits, the casino has already started a Big Data initiative and is successfully ingesting various data sets based on the business objective of a better customer experience.

How organizations make better use of their information resulting in the current enterprise data landscape:

Most of these systems were single vendor solutions, from application to database, even hardware, placing limitations on interoperability and creating costly upgrade scenarios.

Enterprise Applications 

This includes HR systems, accounting and billing systems, CRM, and supply chain management among others. These systems form the heart of any modern business. They contain and manage the organization’s most critical business data. Businesses have many options for products in this category and the products do their jobs quite well. But when it comes to data analytics they are by design rigid to ensure strict enforcement of established business rules. As a result they feed data to more analytically inclined systems to operationalize data.

Knowledge Management (KM) Systems 

While BI addresses the highly structured data problem, KM addresses the unstructured problem. KM products are oriented around user created data, including email messages and documents. As opposed to making analytical decisions, the data in the KM system is used for information sharing and subjective decision making. There is broad agreement that, in an ideal world, user created data would be used seamlessly alongside structured data to make business decisions, but there is still much work to do to accomplish this goal.

Business Intelligence Systems 

BI technologies were developed to take a step in the direction of analytic flexibility and away from the rigidity of enterprise applications. BI is a powerful capability, but as with enterprise applications, the features that fundamentally make it powerful also limit its use. BI requires a significant amount of planning and knowledge of the underlying structure of the data, and thus proves inflexible when adapting to rapidly changing or transient data and struggle to handle large volumes of data.

Log Management and Analysis 

In the last decade or so other applications – often referred to as “intelligence applications” – have emerged. Similar to BI they are built to help businesses understand their data; however, they do not use the core business data that is in the BI system. Instead, these applications operate with other data relevant to the business, including server logs, web site activity, and social media data. The importance of these systems is growing but they still have the limitation that they are designed around a specific use case.

Evolution of Data

Formerly, data challenges were a frustration, but as data volumes grew and became more complex, and as organizations began to recognize the value of data from new sources, the frustration is turning into a real source of pain.

In addition to the limitations described above, businesses have begun to encounter a new problem: Big Data Pain. The challenge of deriving value from existing and new sources of data has been made more complex as these data have increased in scale, frequency, and complexity.

The Data Lake is the solution to this big data pain. It is a compliment to existing intelligence applications, business intelligence capabilities, and enterprise applications. The Data Lake is a repository that can store these data-sets, regardless of size or complexity, and quickly extract insights, sharing these with any application or user. The Data Lake provides information in two fundamental ways:

Data Discovery: all data in the Data Lake can be searched within seconds and this search capability is provided to users and applications across the enterprise.

Pre-computed Analytics: targeted insights for specific business needs are derived from data in the Data Lake. Pre-computed analytics are pre-computed for all possible situations so that the Data Lake can instantaneously provide insights such as pattern recognition, anomaly detection, categorization, and recommendation analytics.

Conclusion

Data exists in silos because of previous technical limitations that drove decisions around what data types were hosted in which repositories. Those same limitations prevent organizations from taking advantage of rapidly changing business needs and data from sources, some that were unknown a few years go. The Data Lake eliminates these limitations by providing a new data infrastructure, acting in concert with the organization’s existing data stores and applications while adding support for new data and rapidly changing data types.

Written By

Saikumar is a content writer who is currently working for Mindmajix. He is a technical blogger who likes to write content on emerging technologies in the software industry. In his free time, he enjoys playing football.

1 Comment

1 Comment

  1. Harshali

    September 14, 2018 at 12:40 pm

    Big Data is really revolutionising the IT industry. These data lakes are creating a lot many job vacancies. Only I could wish our college syllabus are updated enough that students can learn at the pace of technology. Though there are many training courses and online tutorials available.

    Do not stop yourself from learning.
    There is everything on internet.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You May Also Like

Tech

Environment, health, and safety (EHS) management can be a pain point for companies across industries. Under the purview of their EHS program, companies must...

Tech

Can Big Data flourish in an Agile environment? Big Data projects mostly follow the traditional Waterfall approach including Analysis, Requirements Definition, Design, Build and...

Tech

The data center is one of the key drivers of 21st century commerce and communications, but how many of us know about the role...

Featured

Big data technology is one of the newest and most amazing technologies available to organisations of all sizes and industries today. In the past...