Posts Tagged ‘data lakes’

NewImageYou want to find new sources of revenue, then stop over-investing in those enterprise applications because the real value of a business can be found in your under-invested data assets!  The single most important capability that will impact the growth of top line revenue and/or bottom line margin over the next few years will be through Data Monetization. Data science, the principle means through which data is/will be monetized, is a multidisciplinary capability, designed to extract insights from relatively unrelated and often disparate data sources. It is estimated that data science can generate  $2 to $3 of new longterm revenue for every $1 of product/services revenue currently derived by a company that does not use data science. Achieving these multiples, however, will require a fundamental change in not only how we think of our enterprise, but who is key in making it happen.

Data is a byproduct of people, applications, and time. It is the historical digital debris that documents the behavioral existence of our customers, clients, partners, and employees. But most data for the most part, in and of itself, does not tell us anything useful. It can reaffirm what we know (e.g., have many students we have) and often will help use answer questions about what we know we don’t know (e.g., how many students are failing). But no real new value is captured through these understandings. The real exponential grown in actionable knowledge comes from discovering the secretes in what we don’t know we don’t know (e.g., why students are failing and how to help them succeed), which is locked in the disparate structured and unstructured data found in an out of the enterprise.

NewImageDiscovering the deep secrets in these vast data sets will require a different way of thinking, behaving, and investing. While most successful organizations spend their time and money on enterprise applications (EA), tomorrows business success will come through developing new capabilities in data sciences. This change will impact/be impacted by:

>> Resources:: Highly skilled resources are needed in order to converting disparate unstructured and structured data directly into revenue (reselling the insights – knowledge = study of information) or indirectly into secondary products/services (insights that lead to new product/services innovations). These highly skilled resources often come from non-engineering fields such as mathematics, statistics, and physics.

>> Data Lakes (AKA Big Data):: This is the data that fits out of the box. It is structured and unstructured data from all sources of data, not just that relevant to the apparent business domain of interest. New distributed/federated means of data aggregation are needed, since no single repository can hold the vastness of the data needed for effective data science (by definition). Data lakes go beyond traditional data schemas, enterprise data architectures, data marts (contains data subjects), and data warehouses (aggregated data subjects and can vary over time), into the area of data lakes. Think of Hadoop for distributed process of mega large data sets.

>> Distributed Analytics and Intelligences (DAI)::  Insights are as relevant as one ability to not only answer the questions we know we don’t know (second level of knowledge), but also to have the capability to identify and address the questions that we don’t know we don’t know (third level of knowledge). Getting knowledge out of very large data volumes, that are structured and unstructured, as well has rapidly changing (high velocity), requires new analytical tools, systems, and enterprise architectures. Think of Pentaho and Pneuron for distributed analytics operating against mega large data sets (data lakes).


Breaking out of the business box you are in can not be done by just defining the box. To be successful, you need the insights that can only come from those areas found in the things you don’t know you don’t know, which is the world of big data and data sciences.





Read Full Post »