g0blin Bringing together art & technology


Extracting and making use of ANEW tables

The ANEW (Affective Norms for English Words) is a dataset that puts a valence, or mood to words. Using this (all be it from 1999) data, we can derive the mood of a body of text, such as a Tweet - however first things first, lets get the data into a usable form and learn how we can make use of it.

First of all we need to use a few scripts from a package called begin_anew that will allow use to extract the ANEW tables from the PDF published in 1999. If you follow the instructions on the begin_anew site, you should have three CSV files (all.csv, female.csv and male.csv) - we'll be using all.csv, as we can not be 100% sure that we will receive a gender with every Tweet. If you cannot follow the instructions on the begin_anew site, here is a copy of all.csv.

Now that we have our list of valence ratings for each word, we can get down to work. How we can use these values is detailed in this article, however for those that would like to get down and dirty with the minimum of reading, here's a break down of what we will be doing.

  1. Read our CSV file into an array using a PHP script for reference. Each entry in this array will be another array, consisting of a valence value, and a count value
  2. Take a body of text, and using REGEX break it into component words
  3. Loop through these words, checking to see if there is a key in the valence array for the lowercase of the word. If there is, increase that words count by 1 in the valence array
  4. Calculate the total valence by looping through your valence array and adding each value of (count*valence) to a variable - let's call this totalValence
  5. Calculate the total valence hit count by looping through your valence array and adding each valence count to a variable - let's call this totalHits
  6. Calculate the valence of the body of text by performing the following operation - (1/totalHits)*totalValence - let's call this endValence

Once you have completed the above steps, your variable endValence will hold the valence value for the body of text. This will range between 0 and 10, depending on the mood of the text - the lower the valence, the greater the negativity of the text.

There you have it, we can now determine the mood of a body of text, using the ANEW tables! Below is some example PHP source that will show the above techniques in action.

And to finish things off nicely, here is a link to a demo

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.