HOW CLOSURES INTERACT WITH VARIABLE SCOPE IN PYTHON

548
SHARES
2.5k
VIEWS

Let’s say you want to sort a list of numbers but prioritize one group of numbers to come first. This pattern is useful when you’re rendering a user interface and want important messages or exceptional events to be displayed before everything else.

A common way to do this is to pass a helper function as the key argument to a list’s sort method. The helper’s return value will be used as the value for sorting each item in the list. The helper can check whether the given item is in the important group and can vary the sorting value accordingly:

def sort_priority(values, group):
    def helper(x):
        if x in group:
            return (0, x)
        return (1, x)
    values.sort(key=helper)

This function works for simple inputs:

numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
print(numbers)

>>>
[2, 3, 5, 7, 1, 4, 6, 8]

There are three reasons this function operates as expected:

  • Python supports closures—that is, functions that refer to variables from the scope in which they were defined. This is why the helper function is able to access the group argument for sort_priority.
  • Functions are first-class objects in Python, which means you can refer to them directly, assign them to variables, pass them as arguments to other functions, compare them in expressions and if statements, and so on. This is how the sort method can accept a closure function as the key argument.
  • Python has specific rules for comparing sequences (including tuples). It first compares items at index zero; then, if those are equal, it compares items at index one; if they are still equal, it compares items at index two, and so on. This is why the return value from the helper closure causes the sort order to have two distinct groups.

It’d be nice if this function returned whether higher-priority items were seen at all so the user interface code can act accordingly. Adding such behavior seems straightforward. There’s already a closure function for deciding which group each number is in. Why not also use the closure to flip a flag when high-priority items are seen? Then, the function can return the flag value after it’s been modified by the closure.

def sort_priority2(numbers, group):
    found = False
    def helper(x):
        if x in group:
            found = True # Seems simple
            return (0, x)
        return (1, x)
    numbers.sort(key=helper)
    return found

I can run the function on the same inputs as before:

found = sort_priority2(numbers, group)
print('Found:', found)
print(numbers)

>>>
Found: False
[2, 3, 5, 7, 1, 4, 6, 8]

The sorted results are correct, which means items from group were definitely found in numbers. Yet the found result returned by the function is False when it should be True. How could this happen?

When you reference a variable in an expression, the Python interpreter traverses the scope to resolve the reference in this order:

  1. The current function’s scope.
  2. Any enclosing scopes (such as other containing functions).
  3. The scope of the module that contains the code (also called the global scope).
  4. The built-in scope (that contains functions like len and str).

If none of these places has defined a variable with the referenced name, then a NameError exception is raised:

foo = does_not_exist * 5

>>>
Traceback ...
NameError: name 'does_not_exist' is not defined

Assigning a value to a variable works differently. If the variable is already defined in the current scope, it will just take on the new value. If the variable doesn’t exist in the current scope, Python treats the assignment as a variable definition. Critically, the scope of the newly defined variable is the function that contains the assignment.

This assignment behavior explains the wrong return value of the sort_priority2 function. The found variable is assigned to True in the helper closure. The closure’s assignment is treated as a new variable definition within helper, not as an assignment within sort_priority2:

def sort_priority2(numbers, group):
    found = False        # Scope: 'sort_priority2'
    def helper(x):
        if x in group:
            found = True # Scope: 'helper' -- Bad!
            return (0, x)
        return (1, x)

    numbers.sort(key=helper)
    return found

This problem is sometimes called the scoping bug because it can be so surprising to newbies. But this behavior is the intended result: It prevents local variables in a function from polluting the containing module. Otherwise, every assignment within a function would put garbage into the global module scope. Not only would that be noise, but the interplay of the resulting global variables could cause obscure bugs.

In Python, there is special syntax for getting data out of a closure. The nonlocal statement is used to indicate that scope traversal should happen upon assignment for a specific variable name. The only limit is that nonlocal won’t traverse up to the module-level scope (to avoid polluting globals).

Here, I define the same function again, now using nonlocal:

def sort_priority3(numbers, group):
    found = False
    def helper(x):
        nonlocal found # Added
        if x in group:
            found = True
            return (0, x)
        return (1, x)
    numbers.sort(key=helper)
    return found

The nonlocal statement makes it clear when data is being assigned out of a closure and into another scope. It’s complementary to the global statement, which indicates that a variable’s assignment should go directly into the module scope.

However, much as with the anti-pattern of global variables, I’d caution against using nonlocal for anything beyond simple functions. The side effects of nonlocal can be hard to follow. It’s especially hard to understand in long functions where the nonlocal statements and assignments to associated variables are far apart.

When your usage of nonlocal starts getting complicated, it’s better to wrap your state in a helper class. Here, I define a class that achieves the same result as the nonlocal approach; it’s a little longer but much easier to read (see Item 38: “Accept Functions Instead of Classes for Simple Interfaces” for details on the __call__ special method):

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending