Skip to content Skip to sidebar Skip to footer

How To Get The Most Represented Object From An Array

I have an array with some objects, and there are several objects that are alike. E.g: fruit = [apple, orange, apple, banana, banana, orange, apple, apple] What is the most efficien

Solution 1:

Don't reinvent the wheel. In Python 2.7+ you can use the Counter class:

import collections
fruit=['apple', 'orange', 'apple', 'banana', 'banana', 'orange', 'apple', 'apple']
c=collections.Counter(fruit)
print(c.most_common(1))
# [('apple', 4)]

If you are using an older version of Python, then you can download Counter here.

While it's good to know how to implement something like this yourself, it's also a good idea to get used to using Counter, since it is (or going to be) part of the standard library.


Solution 2:

If the objects are hashable then you can use a dict to store the counts:

results = {}
for item in somelist:
  if item not in results:
    results[item] = 1
  else
    results[item] += 1

print max(results.iteritems(), key=operator.itemgetter(1))

Solution 3:

Keep a dictionary of how often each object appears.

Walk through the list once, building this table. As you go, keep track of which object has appeared the most often so far.

This code is untested.

from collections import defaultdict

def mode(objects):
    h = defaultdict(int)
    max_f = 0
    max_obj = None
    for o in objects:
        f = h[o] = h[o] + 1
        if f > max_f:
            max_f = f
            max_obj = o
    return max_obj

If the objects are not hashable, you can instead hash some unique feature of them, such as id(o).


Solution 4:

You want an efficient method. Clearly it's possible in O(n) time, so any method that requires sorting the list is out as that would be O(n log(n)). It's not possible to do it faster than O(n) because even if you check the first n/2-1 elements, and they are all "apple", you don't know that the rest of the elements won't be bananas.

So given that we're looking for O(n), you must iterate over the list and keep a count of how many items of each type you've seen.

A defaultdict would be a simple way to implement this in practice.

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ['apple', 'banana', 'apple']:
...    d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'apple': 2, 'banana': 1})

Solution 5:

The best time you can hope to achieve here is O(n) - you'll always need to walk the entire array at least once. The easiest way will certainly be to build a histogram. If your dictionary structure (map of some kind) offers O(1) insert and retrieve then this is as easy as (groovy-ish pseudocode):

def histogram = new HashMap()
def maxObj = null
def maxObjCount = 0
objectList.each {
    if(histogram.contains(it)) histogram.put(it, histogram.get(it)+1)
    else histogram.put(it, 1)

    if(histogram.get(it) > maxObjCount) {
        maxObj = it
        maxObjCount = histogram.get(it)
    }
}

Post a Comment for "How To Get The Most Represented Object From An Array"