Archive for the ‘Field Note’ Category

NewImageA common data exploration came up while talking with a British colleague in the advertising industry on Friday, how many independent subject areas should be investigated (1, 10, 100, …, N) in order to have a statistically significant chance of making a discovery with the least amount of effort? An answer can be found in “The Power of Three (3),” an application of the Knowledge Singularity when N is very small, which defines meaningful returns on knowledge discovery costs.

As I discussed in the field note “FIELD NOTE: What Makes Big Data Big – Some Mathematics Behind Its Quantification,” perfect insight can be gained asymptotically as one systematically approaches the Knowledge Singularity (77 independent subject areas out of a N-dimensional universe where N >> 77).  While this convergence on infinite knowledge (insight) is theoretically interesting, it is preceded by a more practical application when N is three (3); that is when one explores the combinatorial space of only three subjects.

Let Insight 1 (I_1) represent insights implicit in the set of data 1 (Ds_1), insight 2 (I_2) represent the insights implicit in the set of data 2 (Ds_2), where union of data sets 1 and 2 are null (Ds_1 U Ds_2 = {}).  Further, let insight N (I_N) represent the insights implicit in the set of data N (I_N), where union of data set N and all previous data sets are null (Ds_1 U Ds_2 U … U Ds_N = {}). The total insight implicit in all data sets, 1 through N, therefore, is proportional to the insights gained by exploring total combinations of all data sets (from to previous field note). That is, 


In order to compute a big data ROI, we need quantify the cost of knowledge discovery. Using current knowledge exploration techniques, the cost of discovering insights in any data set is proportional to the size of the data:

          Discovery Cost (I_N) = Knowledge Exploration[Size of Data Set N]

Therefore, a big data ROI could be measured by:

         Big Data ROI = Total Insights [Ds_1 … Ds_N] / Total Discovery Cost [Ds_1 … Ds_N]

if we assume the explored data sets to be equal in size (which generally is not the case, but does not matter for this analysis), then:

         Discovery Cost (I_1) = Discovery Cost (I_2) = Discovery Cost (I_N)


         Total Discovery Cost [Ds_1 U Ds_2 U… U Ds_N] = N x Discovery Cost [Ds] = Big O(N) or proportional to N, where Ds is any data size


IMG 0101


IMG 0103

 We can now plot Big Data ROI as a function of N, for small values of N,

2012 10 06 10 58 53

That was fun, but  so what? The single biggest ROI in knowledge discovery comes when insights are looked for in and across the very first two combined independent data sets. However, while the total knowledge gained exponentially increases for for each additional independent data set added, the return of investment asymptotically approaches a finite limit as N approaches infinity.  One can therefore reasonably argue, that given a limited discovery investment (budget), a minimum of two subjects is needed, while three ensure some level of sufficiency.

Take the advertising market (McCann, Tag, Goodby, etc.), for example. Significant insight development can be gained by exploring the necessary combination of enterprise data (campaign specific data) and social data (how the market reacts) – two independent subject areas. However, to gains some level of assurance, or sufficiency, the addition of one more data set such as IT data (click throughs, induce hits, etc.), increases the overall ROI without materially increasing the costs.

This combination of at least three independent data sets to ensure insightful sufficiency in what is being called “The Power of Three.” While a bit of a mathematical and statistical journey, this intuitively should make sense. Think about the benefits that come from combining subjects like Psychology, Marketing, and Computer Science. While any one or two is great, all three provide the basis for a compelling ability to cause consumer behavior, not just to report on it (computer science) or correlate around it (computer science and market science).


Read Full Post »

NewImageReal-world problems are increasingly being solved through Crowdsourcing, Crowd Funding, and Collective Intelligence . Leveraging the “wisdom of a crowd” is a relatively new concept, dating back to Jeff Howe (2006). Here are a few enterprise examples of companies using crowdsourcing to  solve meaningful real-world problems:

::  In September 2009, Netflix awarded a $1 million prize to “BellKor’s Pragmatic Chaos” for developing an algorithm that more accurately predicts a person’s movie enjoyment based on his or her movie preferences (www.netflixprize.com//index). While Netflix may not see a return on this investment, solving this problem internally could have cost them 10x as much over a longer period of time.

::  Evoke (www.urgentevoke.com) is a “10 week crash course in changing the world” — produced by the World Bank Institute, which leads players through challenges such as a water crisis, food security or empowering women. In its first “season,” 20,000 players registered to play Evoke. Check out this intro video on Vimeo.

::  Poptent (www.poptent.com) has one of the largest crowdsourced-based communities (45,000) of broadcast filmmakers, who are connecting to each other and to companies that want to pay them for their creative talents. Poptent brands are seeking new ways to reach their consumers and create new audiences. 

::  Foldit (www.fold.it) is a game that encourages players to fold proteins — produced by University of Washington Computer Science & Engineering. Proteins play a role in curing many diseases. Understanding how proteins fold is one of the hardest problems in biology. Foldit draws on the human ability to intuitively solve puzzles, in this case folding proteins. Foldit is a modeling tool that embodies the rules and constraints of a problem (in this case protein folding) and engages the Web collective to help solve this problem. 

These are just a few examples. Please let me know if you have a favorite.

Read Full Post »

NewImageDave Feinleib, a Forbes Technology contributor, released a new Big Data landscape point of view. While not all encompassing (e.g. missing technologies like Pneuron), it is a great start of making the complicated big data landscape understandable.


Read Full Post »

NewImageA colleague asked me to comment on a recent article “Are you ‘network literate?,” by Ben Casnocha. The author notes that the information you know “will determine whether you win or lose.” Very true.

Information is changing too quickly (velocity), is to expansive (volume), and too diverse (variety) to make it possible for any one person to have a reasonable chance of remembering/recalling enough to be singularly useful in complex decision making.  As such, Casnocha notes that to address this issue:

– You need to know your network.
– You need to who in your network knows what.
– You need to know how to ask questions that elicit helpful information.
In essence,  the ability to solve problems and make effective decisions is proportional to the amount of knowledge in your network:
Accessible Knowledge (AK) is proportional to number of networks you have x people/network x question/person x knowledge/question. 
A very powerful equation.
NewImageBut, while I agree with these points, Casnocha does not give concrete examples on how to use modern technology/social tools to realize value from them. For example, LinkedIn is THE network tool used by professional to source a network. Right? FaceBook tends to be used for personal and some professional network sourcing, but does not provide any real relationship information (connections like in LinkedIn).
Casnocha also does not identify what tools can be used to identify who in your network knows what. In LinkedIn, people can use their deductive reasoning to learn a lot about what a person has done as a means of knowing what they know.  Are there tools that can do this? But even if there are, it is almost impossible to infer what people are truly capable of answering through inductive reasoning, which is a more important capability in effective decision making. This is a real gap in current social platforms.
In terms of how to ask questions, this is always an issue. English is a very ambiguous language and its interpretation is highly dependent on the educational level of the sender and receiver. Couple that with the highly unstructured nature of written content in social media and you open yourself to a lot of variance in results of knowledge calculus.
To more effectively manage and leverage Accessible Knowledge, social tools like LinkeIn need to start mining not only the connections, but the relative knowledge as well. It is only then that the artificial social networks, that are pervasively replacing our real ones, will become useful enough to support the fine grained decision making necessary for today’s complex problems.

My colleague noted that the AK lack consideration of critical thinking or maybe it was my analysis that lacked critical thinking. Hum. Oh well, let’s assume it was the former and that he is absolutely right. People today have lost this critical ability. I even find myself now and then forgetting to think before I act.

Extending and revising the original thought, here how I would interpret it:

AK = Kp(t) xTct [number of networks you have x people/network x question/person x knowledge/question]

Tct is the critical thinking transformation function
Kp is an efficiency factor dependent on the person and a point in time

That is, the sum total knowledge of your networks is transformed through the critical thinking process, which is spatially dependent on the individual and temporally dependent on when (mode, season, day/night, etc.) the thinking takes place.

Read Full Post »

NewImageA brief big data touch point. In a conversation with a colleague, he asked a question about how big data is changing everyday activities. I noted a conclusion made by Alessandro Mantelero, in his paper “Masters of Big Data: concentration of power over digital information.” He stated, in the context of big data, that:

Examination of the data flows in the evolving datasets shows trends and information without the need of prior working hypothesis, changing the traditional paradigm of social analysis in which the design of the study sample represents the first step which is than followed by the analysis of raw data. 

The key point is that big data is a catalyst for the exploration of level three knowledge: those things we don’t know we don’t know. Traditional discovery methods, used in level 1(know what we know) and 2 knowledge (know what we don’t know) knowledge acquisition, are limited because they infer, and often require, the relevant aspects of the questions under study. In contrast to the traditional deductive approach of knowledge acquisition, the big data is self-explanatory and can be based on inductive knowledge acquisition, which fundamentally requires large amounts of information. 

Therefore, if one believes that transformative business insights can be found in understanding more of what we don’t know we don’t know (level 3), then the use of big data is one of the proven means to achieve this.

Read Full Post »

NewImageI thought I’d share this email observation I made earlier today about how Big Data seems to be quadricating into these orthogonal fields:

1. Data (the intrinsic 1/0 property of big data) which can be broken down subjective areas like interaction data, transaction data || structured, unstructured || realtime/streaming, batch/static || etc.

2. MapReduce platforms – AKA divide and conquer – virtual integration capabilities that enable aggregation and management of multiple name-spaced data sources (Hadoop, InfoSphere Streams, Pneuron, etc.)

3. Data Exploration, Data Mining, and Intelligence Platforms – technical capabilities that  enable one to derive insights from data (Pentaho, IBM InfoSphere, ListenLogic, MatLab, Mathematics, Statistica, etc.).

4. Knowledge Worker platform (AKA The human component) – The two most important capabilities come from data scientists (navigate through data) and  behavioral scientists (navigate through human behavior, which most important things seem to connect back to).

In essence, Big Data has data, an ability to find it and use it, and an ability to explore and learn from it.

Does this seem right?  Missing anything? Please post or email me.

Read Full Post »

NewImageDuring a recent conversation, a colleague asked what the purpose of Enterprise Architecture (EA) is? Here are the three points that seem to make sense:

>> Enterprise architecture is the architecture of business capabilities

>> Enterprise architecture provides a common basis for understanding and communicating how systems are structured to meet strategic objectives

>> Instead of allowing a single solution to drive the technology, EA provides a balanced approach to the selection, design, development and deployment of all the solutions to support the enterprise

>> Enterprise architecture allows stakeholders to prioritize and justify often conflicting technology trade-off decisions based on the big picture

The defining characteristic that differentiates Enterprise Architecture from other architectures is simple:

>> Enterprise scope – – covers multiple business units – crosses functional boundaries


Read Full Post »

Older Posts »