- Learn Python by Building Data Science Applications
- Philipp Kats David Katz
- 279字
- 2021-06-24 13:06:06
Sets
Sets are—in a way—dictionaries without values. First, they use the same curly brackets, and second, their members cannot be duplicated, which are both similar to dictionary keys. Because of that, they are handy to use for deduplication or membership tests. On top of that, sets have built-in mathematical operations, unions, intersections, differences, and symmetrical differences:
>>> names = set(['Sam', 'John', 'James', 'Sam'])
>>> names
{'James', 'John', 'Sam'}
>>> other_names = {'James', 'Nikolai', 'Iliah'}
>>> names.difference(other_names)
{'John', 'Sam'}
>>> names.symmetric_difference(other_names)
{'Iliah', 'John', 'Nikolai', 'Sam'}
Sets don't have an order and, compared to dictionaries, do not guarantee that the order of representation and the order of retrieval will be equal to the order of insertion.
As sets are based on hash tables, it is way faster to check for membership with sets, rather than lists, especially when a lot of elements are present. Let's use Jupyter's magic to compare the performance. Using %timeit for a specific line or %%timeit for a whole cell will estimate the time that it takes to compute this code on your machine:
>>> l = ['apple', 'banana', 'orange', 'grapefruit', 'plum', 'grape', 'pear']
>>> s = set(l)
>>> %timeit 'pear' in l
84.9 ns ± 1.99 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit 'pear' in s
31.6 ns ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
As you can see, even for the shorter array, the performance is essentially more than two times better (faster). This difference will only increase on the larger arrays. Next, let's move on to learning about data structures in more depth.