Kodeclik Blog
Python numpy’s np.unique() function
Numpy’s np.unique() is a handy function that returns all the unique values in a numpy array in a specified set of dimensions. Here is how that works:
import numpy as np
temperatures = [85.6, 75.4, 81.3, 75.4, 81.3]
print(temperatures)
print(np.unique(temperatures))
In the above code, we import the numpy library as “np”, and initialize an array of temperature values. Then we print the temperatures before and after applying the np.unique() function:
The output is:
[85.6, 75.4, 81.3, 75.4, 81.3]
[75.4 81.3 85.6]
Note that the output has removed the duplicate values and further has sorted them! Thu np.unique() is a very handy function to de-duplicate values in your numpy arrays.
We can also try it with two dimensional arrays:
import numpy as np
temperatures = [[81.3, 85.6, 75.4],
[81.3, 75.4, 81.3],
[81.3, 85.6, 75.4]]
print(temperatures)
print(np.unique(temperatures))
The output is:
[[81.3, 85.6, 75.4], [81.3, 75.4, 81.3], [81.3, 85.6, 75.4]]
[75.4 81.3 85.6]
Note that np.unique() has done a unique operation on all elements of all the input arrays, resorted them and rendered it as a one dimensional array even though you began with a two-dimensional array.
You can actually specify that it does the unique() operation on rows at a time by specifying an axis parameter:
import numpy as np
temperatures = [[81.3, 85.6, 75.4],
[81.3, 75.4, 81.3],
[81.3, 85.6, 75.4]]
print(temperatures)
print(np.unique(temperatures,axis=0))
Note that in the above program we have specified “axis=0” which means it takes each row as an element and removes duplicate elements. The output will be:
[[81.3, 85.6, 75.4], [81.3, 75.4, 81.3], [81.3, 85.6, 75.4]]
[[81.3 75.4 81.3]
[81.3 85.6 75.4]]
Note two features of the output. The “sorting” is now done at the full row level, which is why the arrays do not begin with “75.4”, for instance. Second, this full row is used for duplicate calculations, so the duplicate array [81.3, 85.6, 75.4] is weeded out and there is only one of these arrays printed in the output.
Reconstructing your original array using np.unique()
One of the nice features of np.unique() is that in addition to removing the duplicate values, it gives you indices that you can use in case you want to, for some reason, reconstruct the original array. Let us go back to our one dimensional example:
import numpy as np
temperatures = [85.6, 75.4, 81.3, 75.4, 81.3]
print(temperatures)
newtemps, indices = np.unique(temperatures,return_inverse=True)
print(newtemps)
print(indices)
print(newtemps[indices])
Here we are using a “return_inverse=True” argument to our np.unique() call. This returns two values: newtemps, indices. The “newtemps” variable is the same as before. The indices gives a mapping of old values new indices so that when you use these new indices in the final print() statement we get back the original array. Here is the output:
[85.6, 75.4, 81.3, 75.4, 81.3]
[75.4 81.3 85.6]
[2 0 1 0 1]
[85.6 75.4 81.3 75.4 81.3]
Note that the final output line is the same as the first line, i.e., the input array. Thus, So “return_inverse=True” is a very convenient feature that not only finds the unique values but also gives you a mapping of how the original, non-de-duplicated, values were laid out so that you can reconstruct the array if/as necessary.
Hope you enjoyed this blogpost about np.unique() in Python. What will you use it for?
If you liked removing elements from arrays, checkout our blogpost on how to remove None from Python lists.
For more Python content, checkout the math.ceil() and math.floor() functions! Also
learn about the math domain error in Python and how to fix it!
Interested in more things Python? Checkout our post on Python queues. Also see our blogpost on Python's enumerate() capability. Also if you like Python+math content, see our blogpost on Magic Squares. Finally, master the Python print function!
Want to learn Python with us? Sign up for 1:1 or small group classes.