Sept. 13, 2020

Remove duplicate list elements

Make a list unique by removing duplicated items. Set the order of the elements to first or last appearance and get the counts if duplicated.


Page contents

Fast unique elements, unordered

Python set elements are unique, but the order is undetermined.

Py3: Unique with set

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

#   convert list -> set -> list
#   alternate syntax:
#   unq = [*{*nums}]
unq = list(set(nums))
#= [1,2,3,4,7,9]

Notes: Magic of sets

Python sets take iterable sequences and create unique key representations. Each element within a set is unique. Converting a list to a set removes redundancies. A set can be converted back to a list.

The whole process can scramble the order of the original elements. However it is very fast. Another drawback is the presence of nested mutable objects with the list. Set fails to hash mutable objects and raises an error.

Unique elements first appearance

When order is important. Unique on first appearance.

Py3: First come using dictionary

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

#   python 3.6+ dict are ordered
#   alternate syntax:
#   unq = [*{}.fromkeys(nums)]
unq = list(dict.fromkeys( nums ))
#= [1,3,7,2,4,9]

Notes: First appearance using dictionary

A dictionary, like a set has unique keys. So the keys of a dictionary can be used as a register to check for repetition. An unique list can be constructed by creating dictionary keys from the list. These are then converted back to a list. Like sets, mutable nested objects will cause the code execution to be interrupted. Python 3.6 and later makes dictionaries ordered. So the first occurrence is the recorded order. Prior to version 3.6, an ordered dictionary from collections module can be used.

Unique elements last appearance

Py3: Ordered dictionary, reverse order

from collections import OrderedDict

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

#   reverse list before unique
unq = list( OrderedDict.fromkeys(reversed(nums)) )
unq.reverse()
#= [1, 2, 7, 4, 9, 3]

Notes: Reversed ordered dictionary

To get the last unique element, the original list is reversed and passed into an ordered dictionary. This now proceeds to detect the first appearance for the inverted list and convert them into keys. Ordered dictionary from collections module preserves order of insertion. Previous to Python 3.6, the built-in dictionary object did not preserve insertion order. The final unique list has to be reversed inplace to reflect the original list order.

First appearance unique without mutable restrictions

Solution for mutable elements. Unique elements, first appearance generalized using list comprehension.

Py3: Mutables allowed

main_list = [1,3,[7,2],4,1,2,[7,2],4,9,3]

#   order preserved first appearance
register = []
[register.append(x) for x in main_list if x not in register]
print(register)
#= [1, 3, 7, 2, 4, 9]

Notes: List comprehension for generalized uniqueness

List comprehension can check for nested elements without having to create a hash. Since mutable lists are unhashable, set and dictionary based methods fail to create unique keys.

A list comprehension with a register is the solution to take each element, and compare it to elements within the register. If the item is new, it is added to the register, else it is discarded. The ordering is inherently first appearance.

Dictionary of unique elements with counts

Get summary of unique elements along with a count of how many times it occurs.

Py3: Count and unique for hashable elements

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

unq = dict()
#   loop over elements
for item in nums:
    if item in unq:
        #   if key exists, increment value
        unq[item] += 1
    else:
        #   if new key, set value to 1
        unq[item] = 1
print(unq)
#= {1:2, 3:3, 7:2, 2:2, 4:2, 9:1}

Unique counts when mutable elements like nested lists are present.

Py3: Count and unique for mutable elements

#   containing mutable elements
nums = [1,3,[7,2],4,1,2,[7,2],4,9,3]

#   unique list as (key, value) tuples
unq = []
for item in nums:
    #   temporary register
    reg = []
    hit = False
    for key, value in unq:
        if item == key:
            #   in already in unique, increment value
            reg.append((key, value + 1))
            hit = True
        else:
            #   keep other unique tuples intact
            reg.append((key, value))
    if not hit:
        #   if not a hit, add new element
        reg.append((item, 1))
    #   done with element, update unique and move
    #   on to next element
    unq = reg.copy()
print(unq)
#= [(1,2), (3,2), ([7,2],2), (4,2), (2,1), (9,1)]
# (key element, occurrence value) list

Notes: Dictionary of unique elements

Unique elements in a list can be repeated. To create a summary of the unique elements along with counts can be achieved using a dictionary. A loop tests each element for membership as key within this dictionary. The first time it is encountered, it is added to the dictionary with an occurrence value of 1. Every next match increments this counter.

The final result is a dictionary of unique elements with counts. This method has obvious drawback for mutable nested objects. To create summary with unhashable elements in list, a dictionary like structure is needed. This can be achieved with a tuple with (key, value) elements. If a new item is encountered, it is set as (item, 1) tuple inside the unique list. Every time a new element matches with key item, the value in incremented.

Unique elements dictionary using collections.counter

Quick summary using Counter from collections module.

Py3: Counter from collections

from collections import Counter

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

#   using Counter function
unq = dict( Counter(nums) )
print(unq)
#= {1:2, 3:3, 7:2, 2:2, 4:2, 9:1}

Notes: Using Counter

A counter is an efficient way to make a sequence of objects unique, and get summary of counts. The final results can be read off by converting into a dictionary. The elements must be hashable, or in other words mutable objects cannot be a part of the list.

Unique count and position using collections.defaultdict

Summary using defaultdict, a subclass of dictionary.

Py3: Defaultdict count and positions

from collections import defaultdict

nums = [1,3,7,2,4,1,2,7,3,4,9,3]

#   count items
dfd = defaultdict(int)
for i in nums:
    dfd[i] += 1
unq = dict(dfd)
#= {1:2, 3:3, 7:2, 2:2, 4:2, 9:1}


#   position indices
dfd = defaultdict(tuple)
for e, i in enumerate(nums):
    dfd[i] += (e,)
unq = dict(dfd)
#= {1:(0,5), 3:(1,8,11), 7:(2,7), 2:(3,6), 4:(4,9), 9:(10,)}

Notes: Using defaultdict

Defaultdict is useful to concatenate dictionaries. Using integers as the concatenation object, it is possible to implement a flexible counting scheme. Instead of having the unique set of items with counts, it can be configured to record all the index positions of occurrence within the main list.

Popular content

python programming

Read and write lists with Pickle, Array and Pandas

python programming

Flatten nested list or generate blocks of nested lists

python programming

For loop and control statements

python programming

Clear list using inplace and standard methods

python programming

List comprehension with nested conditions

python programming

Concatenate list elements using add, append, extend

python programming

Enumerate and custom counters like skip and loop

python programming

Count number of elements, and memory allocated

python programming

Remove duplicate list elements

python programming

Statistics with numeric lists of integers, fractions, and decimals.

New content

python programming

Read and write lists with Pickle, Array and Pandas

python programming

Is element in list?

python programming

Dictionary merge common key groups

python programming

Packaging loops with zip

python programming

Concatenate list elements using add, append, extend

python programming

List comprehension with nested conditions

python programming

Flatten nested list or generate blocks of nested lists

python programming

Enumerate and custom counters like skip and loop

python programming

Range integer sequences

python programming

For loop and control statements