Tuesday, January 10, 2017

The Python "with" statement and Context Managers

The Python "with" statement and Context Managers

Introduction

The Python "with" statement helps when you want to refactor code that follows a particular pattern, roughly along these lines:

# set things up
try:
    # do something
finally:
    # tear things down

When do you need this sort of code? When there’s some resource you need access to, but don’t have to (and should not) hold on to forever. The two most common cases, I believe, are file descriptors (open files) and thread synchronization (using mutex locks). A bit of googling would turn up lots of articles and forum posts that look like "Python whatsit leaks file descriptors", "Python subprocess runs out of file descriptors", etc. Managing file descriptor usage can be quite tricky if the opened file descriptors are used all over the place; if that situation exists perhaps some other refactoring is also in order. Locks are a little simpler, they exist to manipulate something which could have concurrent access problems, and programmers know that section should be as short as possible, so the usage doesn’t tend to scatter all over the place, but it’s still easy to get into trouble, as an example below will show.

Here’s a simple snippet using a try/finally block to manage file opens, this sort of thing, a decade or more ago, was a common idiom for working with an external file.

File Open Wrapper
outfile = open("foo.txt", "w")
try:
    outfile.write('foo')
finally:
    outfile.close()

To be pedantic, the file open could fail, for example if the file could already exists and not have write permission, so possibly the open should also be wrapped in a try block.

Anyway, if you don’t make sure a file close happens, and there are lots of files to open, you will eventually run out of the system resource that is file descriptors.

Here’s a snippet using locks:

Lock Wrapper
from threading import Lock

lock = Lock()
lock.acquire()
try:
    my_list.append(item)
finally:
    lock.release()

Handling locks carefully is really important. Here’s a different (very artificial, and probably not too realistic) snippet that shows how to get into trouble:

Lock Deadlock
def some_critical_section(my_list, item):
    lock.acquire()
    my_list.append(item)
    return 'some kind of error here'
    lock.release()

In other words, "something happens" somewhere between the acquire and release, and we drop out of the function without the lock ever being released. Future calls to this function will block trying to acquire the lock that they’ll never get. You have to write carefully not to get into such deadlock situations, once code complexity rises. Did you anticipate the possible error exits, and release the lock in all of them?

The "with" statment

Both of the file open and thread lock examples follow the pattern shown at the beginning. In Python 2.5 (as described by PEP 343), a bit of new syntax was introduced to help write things using this pattern more concisely, and thus hopefully more clearly. The new keyword "with" was introduced, and it can take a companion "as" clause. Using "with" sets up a context that Python keeps track of, wrapping it in appropriate begin/end logic.

Using it is pretty simple:

Lock Wrapper Using "with"
from threading import Lock

lock = Lock()
with lock:
    my_list.append(item)

If you need a handle to the resource being acquired, which is usually the case, you can save that by adding an "as" clause, like this:

Open Wrapper Using "with"
with open("foo.txt", "w") as outfile:
     outfile.write('foo')

The block following the "with" statement is the context, and Python takes care of wrapping beginning and ending steps around that context. The above are the new idioms for dealing with these kinds of resources, and probably most people who have learned Python from 2.6 on know these (I said earlier "with" was introduced in Python 2.5, but there it was available only as a "future" feature, 2.6 is where it really became mainstream) - but perhaps don’t understand why. The why is a cleaner syntax and cleaner concept of what the "context" section is.

Context Managers

Notice in the "with" versions of the examples there appear to be details missing: in the lock example, lock.acquire() and lock.release() are not mentioned; in the file open example the outfile.close() is not present - which leads to the question "how does Python know what to do here?". It turns out that using the "with" statement requires help from something called a Context Manager, which is a class which follows the context management protocol. The Python documentation describes how that works.

The tl;dr version (but do go read the documentation to understand more!) is that a context manager provides the methods __enter__ and __exit__, and it is these which do the work mentioned above. The the file and thread-lock objects in Python already come as context managers, and there are a number more.

We could write our own example of how to handle file opens (not needed since the Python file object is already a context manager) just to show what a context manager looks like:

Context Manager Example
class File():
    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode

    def __enter__(self):
        self.open_file = open(self.filename, self.mode)
        return self.open_file

    def __exit__(self, *args):
        self.open_file.close()

with File("foo.txt", "w") as outfile:
    outfile.write('foo')

Decorated Generators as Context Managers

We can of course write context managers in the style just shown, but often it’s easier to write a generator function, which we can then decorate with syntax that will intsruct Python to turn it into a context manager. The decoration is @contextlib.contextmanager (you can shorten that based on the way you import), and what happens is the code before the "yield" statement is turned into the __enter__ method while the code after it is turned into the __exit__ method.

Let’s show how this works with a somewhat practical example: timing an operation via a context manager. Python already provides a very nice timing module (timeit), but using it in the manner of this example (IMHO) makes for nice readable code. The "wrapping" behavior of the context manager doesn’t have to be limited to critical code sections. Timing code fits the model too: the "setup" is capturing a timestamp before the context block runs; the "teardown" is capturing a timestamp after it has completed, and then computing the difference (in the example we also print out the result).

Here is a timing context manager in class form, plus some code to do something we can time (fetching a URL):

Context Manager Class for Timing
from timeit import default_timer
import requests

class Timer(object):
    def __init__(self):
        self.timer = default_timer

    def __enter__(self):
        self.start = self.timer()
        return self

    def __exit__(self, *args):
        end = self.timer()
        self.elapsed_secs = end - self.start
        self.elapsed = self.elapsed_secs * 1000 # millisecs
        print 'elapsed time: %f ms' % self.elapsed

url = 'https://github.com/timeline.json'
with Timer():
    r = requests.get(url)

Running this, you might get something like:

elapsed time: 375.089169 ms

Rewriting it into decorated-generator form:

Context Manager Decorated Generator
from timeit import default_timer
import requests
from contextlib import contextmanager

@contextmanager
def Timer():
    start = default_timer()
    yield
    elapsed_secs = default_timer() - start
    elapsed = elapsed_secs * 1000 # millisecs
    print 'elapsed time: %f ms' % elapsed

url = 'https://github.com/timeline.json'
with Timer():
    r = requests.get(url)

and this version works the same way as the previous one.

Context managers have very appealing applications in testing, where there may be many test cases that each have lots of setup and teardown. It’s usually important that individual tests are isolated, so that running one test does not impact the results of a future test; having a teardown phase that runs reliably even if the test case went badly wrong is very appealing. Since Python 2.7 (and thus all Python 3 versions), context managers are composable - that is you can have combinations of multiple setup and teardown steps, which can even feed into each other, like:

with a(x, y) as A, b(A) as C:

Hopefully this post will have shown some of the uses of the "with" statement. As always, there are more goodies, only need to do a little more digging!


Sunday, January 8, 2017

What this blog is about

Write Better Python? Maybe the title sounds a little pretentious when there are a number of justifiably well known folks in the Python community writing and presenting on the same topic. So just a few words of explanation: at the turn of the century, I was a devoted Python enthusiast. It was was basically the early days of Python 2. I wrote lots of code, I developed a Python course for a well known training company, I served as a tech editor for the second edition of an iconic Python book, and had a draft of my own Python book in the works.  And then I took a very busy job that was not at all Python-focused, and after about 11 years, a second one. Python for the odd tool here and there? Sure. But Python was no longer a focus for me, due to time pressures.

I taught my last instance of my Python course in January 2002.  15 years later in January 2017 I've dug back into the language and I found a lot of things I'd done were pretty simplistic - maybe not exactly "writing C in Python", but on that kind of level, not really expressing things in a modern Pythonic way.

The purpose of the blog is to write about interesting Python aspects as I discover things that have happened in the decade and a half where I've not been able to pay very much attention. This is much more about my own voyage of discovery  than trying to outdo any of the luminaries. I'll be looking at what it means to be "Pythonic", where some refactoring can make more readable code, when to apply the EAFP (Easier to Ask Forgiveness Principle - yes, it's even been given a name) and more. Even though my main objective is to understand things better by writing them up, I hope this material will prove useful to some others!