Sunday, November 8, 2015

Explaining Python Concepts: Iterable, Iterator, and Generator

A few days ago, some colleagues at work brought up a discussion about several closely-related concepts in Python: Iterable, Iterator, Generator. And I thought it might be a good blog post to explain them all, down to the words in the PEPs, and just not hand-waving over them.

Duck typing

Before we all go deep, let's quickly remember that Python is a dynamically typed language. Many people use the term duck typing to refer to Python's type system. Colloquially speaking, if it walks like a duck, quacks like a duck, it must be a duck. That is to say an object is considered a particular type if it behaves in accordance to the defined behavior of that type.

In other words, we are usually talking about interface rather than implementation in Python. The mentioned topics are in fact concepts, not any concrete implementation, as we shall see later.

Iterable

Iterable is the easiest one to explain. If an object has either __iter__ or __getitem__ method, it's an iterable.

The main purpose of an iterable object is it can be used in a for loop. Let's take a look at how for loop works.

>>> import dis
>>> def f():
...   for x in o:
...     pass
... 
>>> dis.dis(f)
  2           0 SETUP_LOOP              14 (to 17)
              3 LOAD_GLOBAL              0 (o)
              6 GET_ITER            
        >>    7 FOR_ITER                 6 (to 16)
             10 STORE_FAST               0 (x)

  3          13 JUMP_ABSOLUTE            7
        >>   16 POP_BLOCK           
        >>   17 LOAD_CONST               0 (None)
             20 RETURN_VALUE

As you can see, the for loop itself is converted into two main instructions: a GET_ITER followed by a FOR_ITER. The first instruction obtains an iterator from an iterable object. The next instruction grabs the next item from that iterator.

You can also tell that I have simplified things a little bit up there to not get into another term that we're getting to next

Iterator

An iterator object has a next method (or __next__ in Python 3). This method returns the next item in a stream, or raises StopIteration when there is no more item.

Now, if there is only next method, the iterator will not be usable in a for loop. In order for "code receiving an iterator can use a for-loop over the iterator", "iterators are currently required to support both protocols.". The other "protocol" referred in that direct quote is iterable concept. That is, iterators are required to have an __iter__ method returning itself. The two concepts, however, are distinct.

So, by now, I hope that it is clear that iterators are iterable, and iterable objects produce iterators.

To illustrate the point, let's consider this example:

>>> seq = [1, 2, 3, 4, 5]
>>> iter1 = seq.__iter__()
>>> for x in iter1:
...   print x
...   if x == 3: break
... 
1
2
3
>>> iter2 = seq.__iter__()
>>> for x in iter1:
...   print x
... 
4
5
>>> for x in iter2:
...   print x
... 
1
2
3
4
5

We have a seq sequence. This sequence is iterable because it has __iter__ method. We obtain the first iterator iter1. We then run a for loop over iter1, breaking out at value 3. The fact that we can run for loop over iter1 confirms that iter1 is iterable. But unlike seq, iterating over iter1 again continues from where it left off, not from the beginning. We also see that iter2 (which is created at the same time we restarts iteration over iter1) is totally different from iter1.

I hope the example makes sense. When iterating over a sequence, you would want to iterate over all its items. When iterating over an iterator, you want to iterate over the items left in that iterator, not items that have been consumed. Put it differently, the __iter__ method defines how the items are iterated over while the next method simply returns the next item in that order. Therefore, an iterator is almost always assumed to be used once only, and an iterable is often iterated over many times.

So we have seen the relationship between iterator and iterable, let's get to the last topic.

Generator

Quoting PEP 255: Python generator is a kind of Python iterator, but of an especially powerful kind.

Generator is more powerful because it has these other methods on top of next: send, throw, and close. Therefore, it should be clear that if an object does not accept any of these methods, it is not a generator. Conversely, an object that provides all of these methods can be treated as a generator.

Due to duck typing, it is wrong to use isinstance(x, types.GeneratorType) to check if x exhibits generator behavior. That type is only defined for object returned from a generator function or generator expression.

Speaking of generator function and generator expression, they should not be confused with generator. Generator function is any function that has the yield keyword somewhere in its body. Invoking this function does not run its body, but returns a generator object which PEP 255 calls generator-iterator. Calling next on the returned generator will execute the function body. Generator expression is similar in concept but with a more familiar list comprehension syntax.

The easiest way to exercise the advanced features of generators is to take advantage of yield from keyword introduced in PEP 380 -- Syntax for Delegating to a Subgenerator, and available since CPython 3.3.

>>> class MyGenerator(object):
...     def next(self):
...         return 42
...     __next__ = next
...     def send(self, *args):
...         print('Received', args)
...     def throw(self, *args):
...         pass
...     def close(self, *args):
...         pass
... 
>>> class MyIterable(object):
...     def __iter__(self):
...         return MyGenerator()
... 
>>> def test_my_generator():
...     yield from MyIterable()
... 
>>> def test_list():
...     yield from [1, 2, 3]
... 
>>> def test_genexp():
...     x = (i for i in (1, 2, 3))
...     yield from x
... 
>>> g = test_genexp()
>>> next(g)
1
>>> g.send('Yup, genexp is a generator.')
2
>>> 
>>> g = test_my_generator()
>>> next(g)
42
>>> g.send('Surprised! This is also a generator.')
Received ('Surprised! This is also a generator.',)
>>> 
>>> g = test_list()
>>> next(g)
1
>>> g.send('Oops, not a generator')
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in test_list
AttributeError: 'list_iterator' object has no attribute 'send'

The above code shows that the return value from a generator expression and generator function is, as expected, a proper generator. The code also shows that MyGenerator is indeed a generator because it fulfills the generator concept, it can be used as a subgenerator in a yield from statement. Lastly, it shows that a regular iterator is not a generator.

On a side note, the example makes it clear that iterable and iterator/generator are two distinct concepts. MyIterable does not produce next item, and MyGenerator is not iterable, but when combined they can be used in for loop.

Summary

To summarize it all up, iterable object produces iterators and therefore control how the items are iterated over. Iterator produces next item in a defined order and usually consumed once. For iterator to be used in a for loop (no pun intended), iterator is required to define an __iter__ method that returns itself. Generator is an enhanced iterator with additional semantics such as send, throw, and close.

Practically speaking, if an object can be fully iterated over with for loop more than once, it's an iterable. If an object has next (or __next__ in Python 3), it is an iterator. And if a value can be sent to an iterator, it is a generator.

No comments:

Post a Comment