Archive for the ‘Artificial Intelligence’ Category

NewImageCan a man-made machine be taught through crowdsourcing to determine the sentiment of a Twitter message regarding the weather? Well, as it turns out the answer is yes. Using Google Prediction API, CrowdFlower, Twitter, and Google Maps, Kevin Cocco demonstrates that crowdsourced based training can produce AI capabilities that accurately predict the complex emotions, such as sentiment. 

This theoretical research is important because it has practical implications on commercial crowdsourcing endeavors. By definition, crowdsourcing effort is directly proportional to the number of people in the network throughout the value chain, which mean cost (decisioning, rewards, etc.) is also proportional to people. From a business perspective, this make it very hard to perform at large scale and highly dependent on the quality of the resources. This is where AI automation, especially around decisioning, can play a significant role.

Here is a reposting of the original work. We will explore this more in future discussions.


NOTE: The following post, by Kevin Cocco, was reposted from the Dialogue Earth blog.

Josh Eveleth March 21, 2012


Can a machine be taught to determine the sentiment of a Twitter message about weather? With the data from over 1 million crowd sourced human judgements the goal was to use this data to train a predictive model and use this machine learning system to make judgements. Below are the highlights from the research and development of a machine learning model in the cloud that predicts the sentiment of text regarding the weather. The following are the major technologies used in this research: Google Prediction API,CrowdFlowerTwitterGoogle Maps.

The only person that can really determine the true sentiment of a tweet is the person who wrote it. When the human crowd worker makes tweet sentiment judgements only 44% of the time do all 5 humans make the same judgement. CrowdFlower’s crowd sourcing processes are great for managing the art and science of sentiment analysis. You can scale up CrowdFlower’s number of crowd workers per record to increase accuracy, of course at a scaled up cost.

The results of this study show that when all 5 crowd workers agree on the sentiment of tweet the predictive model makes the same judgement 90% of the time. When you take all tweets the CrowdFlower and Predictive model return the same judgement 71% of the time. Both CrowdFlower and Google Predictions supplement rather than substitute each other. As shown in this study, CrowdFlower can successfully be used to build a domain/niche specific data set to train a Google Predicition model. I see the power of integrating machine learning into crowd sourcing systems like CrowdFlower. CrowdFlower users could have the option of automatically training a predictive model as the crowd workers make their judgements. CrowdFlower could continually monitor the models trending accuracy and then progressively include machine workers into the worker pool. Once the model hit X accuracy you could have a majority of data stream routed to predictive judgments while continuing to feed a small percentage of data the crowd to refresh current topics and continually validate accuracy. MTurk hits may only be pennies but Google Prediction ‘hits’cost even less.


Weather Sentiment Prediction Demo Application:

LIVE Demo Link: www.SproutLoop.com/prediction_demo

Note, this demo uses server side Twitter feed that is throttled, retry later if you get no results. Contact me regarding high volume applications and integrations with full Twitter firehouse.

Match Rate/ Accuracy Findings:
Below are the highlighted match rates of CrowdFlower human judgements to Google Prediction machine judgements. A match rate compares the resulting predicted sentiment labels from one method up to those from another:

  • Google Prediction API matching CrowdFlower Sentiment Analysis = 71% match rate
  • Mirroring DialogueEarth Plus filtering of lowest 22% confidence scores = 79% match rate
  • Tweets sentiment can be confusing for humans and machines. Google predictions of only the tweets in which all the crowd workers agreed (CrowdFlower confidence score = 1) = 90% match rate


About Google Predication API:
During the May, 2011 Google IO conference Google released a new version of their Google Prediciton API with open access to Google’s machine learning systems in the cloud. The basic process to creating predictive models is to upload training data to Google Cloud Storage and then use Google Prediction API to train a machine learning model from the training data set. Once you have a trained model in the cloud you can write code with their API to submit data for sub second (avg 0.62 sec per) predictions.


About The Data:
Much has been written about DialogueEarth.org’s Weather Mood Pulse system. Pulse has collected 200k+ dataset of tweets that have been assigned one of five labels regarding the tweets sentiment related to the weather. This labeling of tweets is crowd sourced with theCrowdFlower system that presents each tweet with a survey for the workers in the crowd to decision. CrowdFlower has quality control processes in place to present the same tweet to several people in the crowd. DialogueEarth’s crowd jobs were configured so that each tweet was graded by 5 different people. CrowdFlower uses this 5 person matching and each person’s CrowdFlower “Gold” score to calculate a confidence score between 0 and 1 for each tweet. About 44% of the tweets have a confidence score of 1 or 100% of the graders agreed on the sentiment label for the tweet, while some of the other tweets have low scores like 0.321 meaning very little agreement in tweet sentiment plus some influence from each of the graders Gold scores. The Pulse system has chosen to use only the tweets that have a CrowdFlower confidence score equal or greater than 0.6. Dozens of models where build using various segments of CrowdFlower confidence score ranges. Testing showed that the best model used the full confidence range of CrowdFlower records. 

Weather Sentiment Tweet Labels From CrowdFlower:
The CrowdFlower scored tweet data contains the tweet text, weather sentiment label and the CrowdFlower confidence score. The tweet data set was randomized into two segments: 111k(~90%) rows used to train model and 12k rows held out for testing the model.


Sentiment Modeled Tweets 90% Test Tweets 10% % of Total
TOTAL 111,651 12,389 124,040
negative 23,976 2,578 21.4%
positive 21,688 2,384 19.4%
not weather related 34,232 3,780 30.6%
neutral 29,333 3,340 26.3%
cannot tell 2,422 307 2.2%


CrowdFlower Confidence Scores Correlation to Google Prediction Match Rate / Accuracy:

Running Match Rate – Ordered by CrowdFlower Confidence, Google Prediction Confidence


  • X- axis shows the distribution of CF confidence scores in 10% random test data set 12k rows
  • Google is better at predicting Tweets that have a higher CrowdFlower confidence score
  • The Google confidence score correlates with accuracy/match rate, on average higher Google confidence = higher accuracy of matching


Google’s Prediction Confidence Score and Correlation to Match Rate Accuracy
Running Match Rate and Google Prediction Score – Ordered by Prediction Score, Random


  • Higher Google confidence scores correlate with higher matching/accuracy rate.
  • Filtering results at Google conf score > 0.8290 will result in 80% accuracy and filtering/loss of 24.41% of data.


Accuracy/Match Google Confidence % Data Filtered Rows out of 12390
98.47% score = 1 86.88% 1627
90% score > 0.99537 55.16% 5543
85% score > 0.95688 38.78% 7586
80% score > 0.82900 24.41% 9367
78.93% score > 0.79122 21.60% ** 9715
75.00% score > 0.61647 11.06% 11021
70.91% score > 0.25495 0.0% 12390


** Note, 21.6% is the current % of data that Pulse filters by excluding CF conf. scores <= 0.6

Effect of Model Size on Match Rate / Accuracy


  • The larger the data model training data set the higher match accuracy
  • 95% (5k of 111k) decrease in dataset set size decreases match rate by 7.1% (64% – 71%)
  • The classificationAccuracy returned from Google’s model build was between 5%(model 5k) to 0.06%(model 111k) different than tested accuracy rates.


Formatting Tweet Text for modeling:
When preparing text for modeling Google recommends removing all punctuation because “Punctuation rarely add meaning to the training data but are treated as meaningful elements by the learning engine”. The tweet text was lowercased, stripped of all punctuation, special characters, returns, tabs,.. These were replaced by a space to prevent two words from joining. With the unique 140 character limit and the use of emoticons it might be interesting to replace emoticons with words like replacing :’-( with something like a specific replacement ‘ emoticon_i_am_crying ‘ or general ‘emoticon_negative’ before building the model. Here is an example of tweet hygiene below:
BEFORE: 83 degrees n Atl @mention:59pm :> I LOOOOVE this …feels like flawda
AFTER: 83 degrees n atl mention 59pm i loooove this feels like flawda



Training The Google Prediction Model
Here are the basic steps for training a model. The training time can take a few hours for a 100k/10MB training data set, this seems to depend on the sever load. When Google is finishes building the model the trainedmodels.get method will return a confusion matrixand also the model’s “classificationAccuracy” score. Note, Google’s classificationAccuracy score and the testing match rate accuracy scores below are statistically the same (0.71 vs 0.709).

Google Model building confusion matrix with a classificationAccuracy of 0.71


SENTIMENT negative positive not weather related neutral cannot tell
negative 1627 191 261.5 351 88
positive 200.5 1631.5 220 236 40.5
not weather related 239.5 163.5 268 1965.5 66
neutral 262.5 163.5 268 1965.5 66
cannot tell 5 3 1 8.5 3.5


Testing 12k hold out tweets against the model above with a match rate / accuracy of 0.709
Crowd Flower Actual


SENTIMENT negative positive not weather related neutral cannot tell
negative 1824 232 294 413 94
positive 196 1732 216 200 61
not weather related 251 200 2967 460 64
neutral 300 215 303 2263 86
cannot tell 7 5 0 4 2



Read Full Post »

NewImagePlease do not use or operate heavy equipment while reading this recommended research paper on using artificial intelligence to selectively guide crowdsourced activities, as I would not want to be personally responsible for the outcome (my attempt at a joke). All joking aside, Archer’s research is pivotal to those organizations who business models are predicated on crowdsourcing activities and should be considered when looking for innovative enhancement to existing platforms.

Archer demonstrates that filtering (pre-qualifying) a solution space prior to performing a crowdsourcing activity not only reduces the solutioning time but increasing the final solution quality as well. While there are many means through which this qualification is possible, from manual to automated, the use of artificial intelligence (e.g., neural networks, genetic algorithms, etc.) demonstrates hyper potential for commercial applications.

2012 08 14 14 47 05

The crowdsource AI systems do not have to me overly complex, taking on the form of a helper tool more than an AI-being. Think of how Adobe After Effects uses its brainstorming functions to prequalify new solutions based on characteristics you currently like and/or dislike. The primary requirement a Crowdsource AI Helper needs is meta data – the more the better. There in lays another innovation along this path, being able to automatically abstract out relevant explicit meta data automatically from implicit characteristics. We’ll hold this discussion for a future blog.

Read Full Post »

There are a lot of great questions coming out of eBiz and the latest I’d like to address is, “Is Dirty Data Becoming a Showstopper for SOA?

Dirty data is one of the many reasons why service-oriented architectures (SOA) are so powerful. Gartner studies over the last decade have demonstrated that dirty data “leads to significant costs, such as higher customer turnover, excessive expenses from customer contact processes like mail-outs and missed sales opportunities.” In this day and age, there can be no doubt that the one and zero sitting in your databases are corrupted. But what do you do about it?

Many have suggested that this is an IT issue. The fact that data assets are inconsistent, incomplete, and inaccurate is somehow the responsibility of those response for administrating the technology systems that power our enterprises. There solution seems to further suggest the only real way to solve the problem is with a “reset” of the data supply chain – retool the data supply chain, reconfigure the data bases, do a one time scrub of ALL data assets, and set up new rules that somehow prohibit corruption activities. At best, this has been shown to be a multi-million dollar, multi-year activity for fortune 2000 class companies. But at worst, it is a mere pipe dream of every occurring.

A more practical solution can be found in SOA, specifically Dirty Data Modernization Services (DDMS). These are highly tailored temporal services designed around the specific Digital Signatures of the dirty data in question. For example, Dirty Data Identification Services use artificial intelligence to identify and target corrupt data sources. Dirty Data Transformation Services use ontological web-based algorithms to transform bad data into better data (not correct data). Other services like Accuracy and Relevance Services can be used on an ongoing basis to aid in mitigating the inclusion of bad or dirty data.

Human beings, by our nature, do not like change. We often look to rationalize away doing the hard things in life, rather than justifying the discomfort that comes through meaningful change. Dirty data is just one of those reasons one can use if you truly don’t want to get on with different, often better solution paradigm. So, rather that treat dirty data as a show stopper, look to it as a catalyst for real meaningful enterprise change.


Read Full Post »

In Michael Fauscette’s blog on how to integrate Web 2.0 into Enterprise apps, he notes the ever increasing use of social media tools, specifically Twitter. For example, JetBlue started with 1 representative following Twitter feeds, but now up to 10. While Twitter is an effective social media too, it truly begs the Glenn Gruber productivity question of the day: How do you scale beyond bodies?

Well Glenn, excellent question and one that can be effectively addressed through the use of bots, specifically Chatter Bots. Chatter Bots are a type of conversational software agent designed to work with humans through nature language syntax/semantics. Two of the most famous chatter bots are Alice and Jabberwacky, and their most infamous conversation was captured in 2007.

Chatter bots can now be trained to aid in twitter-based conversations, recognizing both the context of the discussion and the potential need to twit. Couple this capability with access to a virtually unlimited backend source of data and you have a very power social media tool. In essence, they are the Peanut Butter Cups peanut butter and chocolate of the social media world.

Twitter is clearly an important company tools, but not scalable in it current human-based form. As an industry, it would be interesting to take social media to v2.0 by combining the proven value streams of its channels with the maturity of artificial intelligence. This is all part of the ongoing maturation of enterprise information systems through the development of social media contextual capabilities.

Read Full Post »