Skip to content Skip to sidebar Skip to footer

Separating Nltk.freqdist Words Into Two Lists?

I have a series of texts that are instances of a custom WebText class. Each text is an object that has a rating (-10 to +10) and a word count (nltk.FreqDist) associated with it: &g

Solution 1:

Ok, let's say you start with this for the purposes of testing:

class Rated(object): 
  def __init__(self, rating, freq_dist): 
    self.rating = rating
    self.freq_dist = freq_dist

a = Rated(5, nltk.FreqDist('the boy sees the dog'.split()))
b = Rated(8, nltk.FreqDist('the cat sees the mouse'.split()))
c = Rated(-3, nltk.FreqDist('some boy likes nothing'.split()))

trainingTexts = [a,b,c]

Then your code would look like:

from collections import defaultdict
from operator import itemgetter

# dictionaries for keeping track of the counts
pos_dict = defaultdict(int)
neg_dict = defaultdict(int)

for r in trainingTexts:
  rating = r.rating
  freq = r.freq_dist

  # choose the appropriate counts dictif rating > 0:
    partition = pos_dict
  elif rating < 0: 
    partition = neg_dict
  else:
    continue# add the information to the correct counts dictfor word,count in freq.iteritems():
    partition[word] += count

# Turn the counts dictionaries into lists of descending-frequency wordsdefonly_list(counts, filtered):
  returnsorted(filter(lambda (w,c): w notin filtered, counts.items()), \
                key=itemgetter(1), \
                reverse=True)

only_positive_words = only_list(pos_dict, neg_dict)
only_negative_words = only_list(neg_dict, pos_dict)

And the result is:

>>> only_positive_words
[('the', 4), ('sees', 2), ('dog', 1), ('cat', 1), ('mouse', 1)]
>>> only_negative_words
[('nothing', 1), ('some', 1), ('likes', 1)]

Post a Comment for "Separating Nltk.freqdist Words Into Two Lists?"