Archive for the ‘Crowdsourcing’ Category

NewImageI just recently re-read “Programming Collective Intelligence: Building Smart Web 2.0 Applications,” by Toby Segaran, as part of my ongoing research into developing a more systematic view of crowdsourcing enterprise architectures.  Part of a larger O’Reilly book series (e.g., Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media SitesThink Complexity: Complexity Science and Computational Modeling, Machine Learning for Hackers, etc.), this text “… takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general — all from information that you and others collect every day.”  This book explains:

  • Collaborative filtering techniques that enable online retailers to recommend products or media
  • Methods of clustering to detect groups of similar items in a large dataset
  • Search engine features — crawlers, indexers, query engines, and the PageRank algorithm
  • Optimization algorithms that search millions of possible solutions to a problem and choose the best one
  • Bayesian filtering, used in spam filters for classifying documents based on word types and other features
  • Using decision trees not only to make predictions, but to model the way decisions are made
  • Predicting numerical values rather than classifications to build price models
  • Support vector machines to match people in online dating sites
  • Non-negative matrix factorization to find the independent features in a dataset
  • Evolving intelligence for problem solving — how a computer develops its skill by improving its own code the more it plays a game

I had forgotten just how good this book is on making the complex subject matter of machine-kerning algorithms understandable for those crowd-sourcing practitioners. Segaran takes complex algorithms and makes them easy to understand, mostly through examples, that are directly applicable to web-based social interactions. Even in 2012, this 2008 text is still a must read for those building commercial-grade crowdsourcing solutions.


Read Full Post »

NewImageCan a man-made machine be taught through crowdsourcing to determine the sentiment of a Twitter message regarding the weather? Well, as it turns out the answer is yes. Using Google Prediction API, CrowdFlower, Twitter, and Google Maps, Kevin Cocco demonstrates that crowdsourced based training can produce AI capabilities that accurately predict the complex emotions, such as sentiment. 

This theoretical research is important because it has practical implications on commercial crowdsourcing endeavors. By definition, crowdsourcing effort is directly proportional to the number of people in the network throughout the value chain, which mean cost (decisioning, rewards, etc.) is also proportional to people. From a business perspective, this make it very hard to perform at large scale and highly dependent on the quality of the resources. This is where AI automation, especially around decisioning, can play a significant role.

Here is a reposting of the original work. We will explore this more in future discussions.


NOTE: The following post, by Kevin Cocco, was reposted from the Dialogue Earth blog.

Josh Eveleth March 21, 2012


Can a machine be taught to determine the sentiment of a Twitter message about weather? With the data from over 1 million crowd sourced human judgements the goal was to use this data to train a predictive model and use this machine learning system to make judgements. Below are the highlights from the research and development of a machine learning model in the cloud that predicts the sentiment of text regarding the weather. The following are the major technologies used in this research: Google Prediction API,CrowdFlowerTwitterGoogle Maps.

The only person that can really determine the true sentiment of a tweet is the person who wrote it. When the human crowd worker makes tweet sentiment judgements only 44% of the time do all 5 humans make the same judgement. CrowdFlower’s crowd sourcing processes are great for managing the art and science of sentiment analysis. You can scale up CrowdFlower’s number of crowd workers per record to increase accuracy, of course at a scaled up cost.

The results of this study show that when all 5 crowd workers agree on the sentiment of tweet the predictive model makes the same judgement 90% of the time. When you take all tweets the CrowdFlower and Predictive model return the same judgement 71% of the time. Both CrowdFlower and Google Predictions supplement rather than substitute each other. As shown in this study, CrowdFlower can successfully be used to build a domain/niche specific data set to train a Google Predicition model. I see the power of integrating machine learning into crowd sourcing systems like CrowdFlower. CrowdFlower users could have the option of automatically training a predictive model as the crowd workers make their judgements. CrowdFlower could continually monitor the models trending accuracy and then progressively include machine workers into the worker pool. Once the model hit X accuracy you could have a majority of data stream routed to predictive judgments while continuing to feed a small percentage of data the crowd to refresh current topics and continually validate accuracy. MTurk hits may only be pennies but Google Prediction ‘hits’cost even less.


Weather Sentiment Prediction Demo Application:

LIVE Demo Link: www.SproutLoop.com/prediction_demo

Note, this demo uses server side Twitter feed that is throttled, retry later if you get no results. Contact me regarding high volume applications and integrations with full Twitter firehouse.

Match Rate/ Accuracy Findings:
Below are the highlighted match rates of CrowdFlower human judgements to Google Prediction machine judgements. A match rate compares the resulting predicted sentiment labels from one method up to those from another:

  • Google Prediction API matching CrowdFlower Sentiment Analysis = 71% match rate
  • Mirroring DialogueEarth Plus filtering of lowest 22% confidence scores = 79% match rate
  • Tweets sentiment can be confusing for humans and machines. Google predictions of only the tweets in which all the crowd workers agreed (CrowdFlower confidence score = 1) = 90% match rate


About Google Predication API:
During the May, 2011 Google IO conference Google released a new version of their Google Prediciton API with open access to Google’s machine learning systems in the cloud. The basic process to creating predictive models is to upload training data to Google Cloud Storage and then use Google Prediction API to train a machine learning model from the training data set. Once you have a trained model in the cloud you can write code with their API to submit data for sub second (avg 0.62 sec per) predictions.


About The Data:
Much has been written about DialogueEarth.org’s Weather Mood Pulse system. Pulse has collected 200k+ dataset of tweets that have been assigned one of five labels regarding the tweets sentiment related to the weather. This labeling of tweets is crowd sourced with theCrowdFlower system that presents each tweet with a survey for the workers in the crowd to decision. CrowdFlower has quality control processes in place to present the same tweet to several people in the crowd. DialogueEarth’s crowd jobs were configured so that each tweet was graded by 5 different people. CrowdFlower uses this 5 person matching and each person’s CrowdFlower “Gold” score to calculate a confidence score between 0 and 1 for each tweet. About 44% of the tweets have a confidence score of 1 or 100% of the graders agreed on the sentiment label for the tweet, while some of the other tweets have low scores like 0.321 meaning very little agreement in tweet sentiment plus some influence from each of the graders Gold scores. The Pulse system has chosen to use only the tweets that have a CrowdFlower confidence score equal or greater than 0.6. Dozens of models where build using various segments of CrowdFlower confidence score ranges. Testing showed that the best model used the full confidence range of CrowdFlower records. 

Weather Sentiment Tweet Labels From CrowdFlower:
The CrowdFlower scored tweet data contains the tweet text, weather sentiment label and the CrowdFlower confidence score. The tweet data set was randomized into two segments: 111k(~90%) rows used to train model and 12k rows held out for testing the model.


Sentiment Modeled Tweets 90% Test Tweets 10% % of Total
TOTAL 111,651 12,389 124,040
negative 23,976 2,578 21.4%
positive 21,688 2,384 19.4%
not weather related 34,232 3,780 30.6%
neutral 29,333 3,340 26.3%
cannot tell 2,422 307 2.2%


CrowdFlower Confidence Scores Correlation to Google Prediction Match Rate / Accuracy:

Running Match Rate – Ordered by CrowdFlower Confidence, Google Prediction Confidence


  • X- axis shows the distribution of CF confidence scores in 10% random test data set 12k rows
  • Google is better at predicting Tweets that have a higher CrowdFlower confidence score
  • The Google confidence score correlates with accuracy/match rate, on average higher Google confidence = higher accuracy of matching


Google’s Prediction Confidence Score and Correlation to Match Rate Accuracy
Running Match Rate and Google Prediction Score – Ordered by Prediction Score, Random


  • Higher Google confidence scores correlate with higher matching/accuracy rate.
  • Filtering results at Google conf score > 0.8290 will result in 80% accuracy and filtering/loss of 24.41% of data.


Accuracy/Match Google Confidence % Data Filtered Rows out of 12390
98.47% score = 1 86.88% 1627
90% score > 0.99537 55.16% 5543
85% score > 0.95688 38.78% 7586
80% score > 0.82900 24.41% 9367
78.93% score > 0.79122 21.60% ** 9715
75.00% score > 0.61647 11.06% 11021
70.91% score > 0.25495 0.0% 12390


** Note, 21.6% is the current % of data that Pulse filters by excluding CF conf. scores <= 0.6

Effect of Model Size on Match Rate / Accuracy


  • The larger the data model training data set the higher match accuracy
  • 95% (5k of 111k) decrease in dataset set size decreases match rate by 7.1% (64% – 71%)
  • The classificationAccuracy returned from Google’s model build was between 5%(model 5k) to 0.06%(model 111k) different than tested accuracy rates.


Formatting Tweet Text for modeling:
When preparing text for modeling Google recommends removing all punctuation because “Punctuation rarely add meaning to the training data but are treated as meaningful elements by the learning engine”. The tweet text was lowercased, stripped of all punctuation, special characters, returns, tabs,.. These were replaced by a space to prevent two words from joining. With the unique 140 character limit and the use of emoticons it might be interesting to replace emoticons with words like replacing :’-( with something like a specific replacement ‘ emoticon_i_am_crying ‘ or general ‘emoticon_negative’ before building the model. Here is an example of tweet hygiene below:
BEFORE: 83 degrees n Atl @mention:59pm :> I LOOOOVE this …feels like flawda
AFTER: 83 degrees n atl mention 59pm i loooove this feels like flawda



Training The Google Prediction Model
Here are the basic steps for training a model. The training time can take a few hours for a 100k/10MB training data set, this seems to depend on the sever load. When Google is finishes building the model the trainedmodels.get method will return a confusion matrixand also the model’s “classificationAccuracy” score. Note, Google’s classificationAccuracy score and the testing match rate accuracy scores below are statistically the same (0.71 vs 0.709).

Google Model building confusion matrix with a classificationAccuracy of 0.71


SENTIMENT negative positive not weather related neutral cannot tell
negative 1627 191 261.5 351 88
positive 200.5 1631.5 220 236 40.5
not weather related 239.5 163.5 268 1965.5 66
neutral 262.5 163.5 268 1965.5 66
cannot tell 5 3 1 8.5 3.5


Testing 12k hold out tweets against the model above with a match rate / accuracy of 0.709
Crowd Flower Actual


SENTIMENT negative positive not weather related neutral cannot tell
negative 1824 232 294 413 94
positive 196 1732 216 200 61
not weather related 251 200 2967 460 64
neutral 300 215 303 2263 86
cannot tell 7 5 0 4 2



Read Full Post »

NewImagePlease do not use or operate heavy equipment while reading this recommended research paper on using artificial intelligence to selectively guide crowdsourced activities, as I would not want to be personally responsible for the outcome (my attempt at a joke). All joking aside, Archer’s research is pivotal to those organizations who business models are predicated on crowdsourcing activities and should be considered when looking for innovative enhancement to existing platforms.

Archer demonstrates that filtering (pre-qualifying) a solution space prior to performing a crowdsourcing activity not only reduces the solutioning time but increasing the final solution quality as well. While there are many means through which this qualification is possible, from manual to automated, the use of artificial intelligence (e.g., neural networks, genetic algorithms, etc.) demonstrates hyper potential for commercial applications.

2012 08 14 14 47 05

The crowdsource AI systems do not have to me overly complex, taking on the form of a helper tool more than an AI-being. Think of how Adobe After Effects uses its brainstorming functions to prequalify new solutions based on characteristics you currently like and/or dislike. The primary requirement a Crowdsource AI Helper needs is meta data – the more the better. There in lays another innovation along this path, being able to automatically abstract out relevant explicit meta data automatically from implicit characteristics. We’ll hold this discussion for a future blog.

Read Full Post »

NewImageReal-world problems are increasingly being solved through Crowdsourcing, Crowd Funding, and Collective Intelligence . Leveraging the “wisdom of a crowd” is a relatively new concept, dating back to Jeff Howe (2006). Here are a few enterprise examples of companies using crowdsourcing to  solve meaningful real-world problems:

::  In September 2009, Netflix awarded a $1 million prize to “BellKor’s Pragmatic Chaos” for developing an algorithm that more accurately predicts a person’s movie enjoyment based on his or her movie preferences (www.netflixprize.com//index). While Netflix may not see a return on this investment, solving this problem internally could have cost them 10x as much over a longer period of time.

::  Evoke (www.urgentevoke.com) is a “10 week crash course in changing the world” — produced by the World Bank Institute, which leads players through challenges such as a water crisis, food security or empowering women. In its first “season,” 20,000 players registered to play Evoke. Check out this intro video on Vimeo.

::  Poptent (www.poptent.com) has one of the largest crowdsourced-based communities (45,000) of broadcast filmmakers, who are connecting to each other and to companies that want to pay them for their creative talents. Poptent brands are seeking new ways to reach their consumers and create new audiences. 

::  Foldit (www.fold.it) is a game that encourages players to fold proteins — produced by University of Washington Computer Science & Engineering. Proteins play a role in curing many diseases. Understanding how proteins fold is one of the hardest problems in biology. Foldit draws on the human ability to intuitively solve puzzles, in this case folding proteins. Foldit is a modeling tool that embodies the rules and constraints of a problem (in this case protein folding) and engages the Web collective to help solve this problem. 

These are just a few examples. Please let me know if you have a favorite.

Read Full Post »

NewImageCrowdsourcing continues to play a subtle but import part of a larger conversation on the role of social networks in big data analyses. While big data traditionally is concerned more about the identification of latent knowledge in larger disparate sets of structured/unstructured/semi-structued data sets, there is a growing recognition that social networks are a prime reservoir for this data. More importantly, the use of big data on social networks can effectively result in more predictable outcomes from sourced community networks (AKA crowdsources).

But in order to effectively use crowd sourcing through big data, it is important to understand what motivated online community participants and what is the influence of rewards or payoffs. In “Understanding Crowdsourcing: Effects of mutative and rewards on participation and performance in voluntary online activities,”  the author’s thesis examines the effects of motivation and rewards on the participation and performance of online community members. This is a great read for both those new to the subject as well as seasoned practitioners.

Read Full Post »