An introspective python decorator using stack frames and the inspect module

Table of Contents

Fluent Python is the best resource to learn to use and love python. Some days ago I was reading a section of the chapter 7: “Function Decorators and Closures”. This chapter has a lot of interesting and cool examples. Here I’ll discuss one of them and how I tried to put more shiny stuff in it.

A book that every python programmer should read.

Gaining a deeper understanding about the execution context of a function

The Fluent Python Book example

Ramalho’s book presents us with a @clock decorator that can be used to decorate a method, measure the time it takes to execute, and print in a human-readable format the arguments and name of the method. The example is shown below:

import time

DEFAULT_FMT = '[{elapsed:0.8f}s] {name}({args}) -> {result}'

def clock(fmt=DEFAULT_FMT):
    def decorate(func):
        def clocked(*_args):
            t0 = time.time()
            _result = func(*_args)
            elapsed = time.time() - t0
            name = func.__name__
            args = ', '.join(repr(arg) for arg in _args)
            result = repr(_result)
            log_string = fmt.format(**locals())
            # send to somewhere 
            # csv, ELK, etc
            print(log_string)
            return result
        return clocked
    return decorate

@clock('[{elapsed:0.8f}s] {name}({args})')
def snooze(seconds):
    time.sleep(seconds)
    return time.time()

for _ in range(3):
    snooze(.123)

[0.12315798s] snooze(0.123)
[0.12315822s] snooze(0.123)
[0.12317085s] snooze(0.123)

If you don’t understand something in the above code I recommend that you take some time searching and reading about each aspect. There are many cool things being used there, for example:

repr which is a function that returns a string representation of an object.
- This is essential because the DEFAULT_FMT is a string, not a f-string, we can’t just put a generic object to be printed in DEFAULT_FMT.
log_string = fmt.format(**locals()): instead of creating a repetitive code like fmt.format(**{"result":result, "args":args, ...}) we can just use the locals() which is a dictionary that contains all the local variables of the current scope.

When I study something I always like to create a fresh problem with the stuff that I’ve learned and try to solve it. Sometimes there is no solution. But even if there is no solution, we still learn other stuff.

I’ve started by creating the following example:

import numpy as np

@clock('[{elapsed:0.8f}s] {name}({args})')
def snooze_and_snore(seconds, snore_loud, min_prob_to_snore=0.4):
    time.sleep(seconds)
    to_snore = np.random.uniform() > min_prob_to_snore
    if to_snore:
        if snore_loud:
            pass
            # r.requets(wake_up_everyone)
        pass
    return time.time()

for _ in range(3):
    snooze_and_snore(.4, True, .1)
    snooze_and_snore(.4, False, .1)

[0.40229130s] snooze_and_snore(0.4, True, 0.1)
[0.40049720s] snooze_and_snore(0.4, False, 0.1)
[0.40058565s] snooze_and_snore(0.4, True, 0.1)
[0.40013075s] snooze_and_snore(0.4, False, 0.1)
[0.40052223s] snooze_and_snore(0.4, True, 0.1)
[0.40057564s] snooze_and_snore(0.4, False, 0.1)

Ok, what are the problems/issues/limitations that the above code showed me?

Current issues and limitations

We don’t have information about the names of the arguments passed to the method.
- If the list of arguments is long, trying to understand what is happening becomes a hard task. Because we are increasing the amount of stuff that we must keep in our mind. We are increasing the cognitive load in the terms presented in the excelsior book: A Philosophy of Software Design.
- A person who is not familiar with the codebase cannot understand what is happening by analyzing the outputs of the decorator. If these outputs are being stored in the ELK stack, this will be unproductive.
We have the locals() information from the decorator which is fed by the result of the decorated method. However, we can’t get any information about the locals() of the decorated method. Why is this bad?
- The final internal state of the method is commonly used to understand the execution of a method.
- Sometimes a method depends on random variables defined in the local context. Thus, the same set of arguments can give different executions. Until now, we don’t have a way to get the locals() of the decorated method. For example, in the snooze_and_snore we can’t know if the person snored or not.

We will attack the first issue using the inspect module. As I’ll show you, we can do cool things with this module.

If you know about sys.trace, call stack and inspect.signatures I recommend you go directly to the section Let’s solve our problem

Creating an introspective code with the inspect module

The inspect module is a Python standard library that provides several tools to help you to introspect and consequently learn about live objects like functions, modules, classes, instances, frame objects (I’ll talk about frames later in this post), etc. Well, what can you do with this? Really, a lot of things. You can use it to automatically create documentation, parse the docstrings, manipulate the AST, etc.

A decorator that validates arguments using mathematical notation

In the last years, we have seen the development of the typing module and the mypy static analysis tool for python. This module and tool can be very useful sometimes. However, it doesn’t provide some features that are essential for proper validation. But at least in my experience creating code for my Ph.D., I usually don’t need so much sophisticated type theory and validation to be able to write a good code for a mathematical modeling tool. Most of the mathematical validation that I need is just checking if an argument still satisfies some constraints or lives in a proper subspace. If not, I need to raise an exception or perform some kind of regularization.

Let’s create a decorator that will validate arguments using simple mathematical notation.

We will create a dictionary that will contain the annotation as a key and the value will be a human-readable description of the annotation and a method responsible for check if everything is right.

import inspect

MATH_SPACES = {
    "\in R": {"name" : "real space", "validator": lambda x: isinstance(x, (int, float))},
    "\in R_+": {"name": "space of real numbers greater than zero", "validator": lambda x: isinstance(x, (int, float)) and x > 0},
}

We will use the inspect.signature to get the annotations of each argument of the decorated method. For example, if the decorated method is def foo(a: '\in R', b) the inspect.signature(foo) will return an object which we can use to extract an ordered dictionary with the arguments and the annotations. Like this

def foo(a: "\in R", b, c:int, d= 2):
    pass
for k, v in inspect.signature(foo).parameters.items():
    print(k, v, type(v._annotation), v.default)

a a: '\\in R' <class 'str'> <class 'inspect._empty'>
b b <class 'type'> <class 'inspect._empty'>
c c: int <class 'type'> <class 'inspect._empty'>
d d=2 <class 'type'> 2

Let’s create our decorator. It should be really simple. Just check if we should verify the argument and if so, check if the value respects the annotated mathematical space.

def math_validator():
    def decorate(func):
        def decorated(*_args):
            sig = inspect.signature(func)
            # sig parameters is an ordered dict
            for i, (k, v) in enumerate(sig.parameters.items()):
                annotation = v._annotation
                if not isinstance(annotation, str):
                    continue
                if not annotation in MATH_SPACES:
                    print(f"{annotation} is not implemented in Math Spaces")
                    continue # skip if we didn't implement this space validation

                if not MATH_SPACES[annotation]["validator"](_args[i]):
                    raise ValueError(f"{k} doesn't belong to the {MATH_SPACES[annotation]['name']}")
            result = func(*_args)
            print(f"{func.__name__}({_args}) -> {result}")
            return result 
        return decorated
    return decorate

@math_validator()
def simple_method(x: "\in R", y: "\in R_+", z: float = 2) -> float:
    """Simple method to add two numbers together and 
    divide by the last number

    Args:
        x: The first number to add.
        y: The second number to add.
        z: it is a float number that will be the power of the result.
            This will not be checked for math spaces.
    Returns:
        float: result
    
    """
    result = (x+y)/y
    return result**z

simple_method(1, 2)
simple_method(1, 0)

simple_method((1, 2)) -> 1.5
---> 19 simple_method(1, 0)
...
<locals>.decorate.<locals>.decorated(*_args)
     11         continue
     13     if not MATH_SPACES[annotation]["validator"](_args[i]):
---> 14         raise ValueError(f"{k} doesn't belong to the {MATH_SPACES[annotation]['name']}")
     15 result = func(*_args)
     16 print(f"{func.__name__}({_args}) -> {result}")

ValueError: y doesn't belong to the space of real numbers greater than zero

Our decorator is quite simple but does the job. You can go deeper into this and use a more sophisticated mathematical notation, printing using latex, etc. But now, let’s go back to the Python Fluent example because the inspect.signature already provides us with a way to solve the first limitation!

Going back to the Fluent python example

Let’s remember one thing that I’ve pointed out:

A person who is not familiar with the code base will not be able to understand what is happening just by analyzing the outputs of the decorator.

It’s obvious that we can overcome this issue by using the inspect module. Let’s create a more elaborated example using monkeys and a zookeeper that must record and report the information about how the life of the monkeys are going.

NUM_MONKEYS = 20
def feed_monkeys(n_bananas, n_apples=0): 
    monkeys = {
        f"monkey_{i}": {"bananas": 0, "apples": 0}
        for i in range(NUM_MONKEYS)
    }
    while n_bananas > 0 and n_apples > 0:
        if np.random.uniform() < 0.4:
            continue
        monkey = monkey[np.random.choice(list(monkeys.keys()))]
        if n_bananas > 0:
            monkey["bananas"] += 1
            n_bananas -= 1
        if n_apples > 0:
            monkey["apples"] += 1
            n_apples -= 1
        if n_apples == 0 and n_bananas == 0:
            break

My solution is the @report decorator presented below.

def report(fmt=DEFAULT_FMT):
    def decorate(func):
        def  decorated(*_args):
            sig = inspect.signature(func)
            named_args = {}
            num_args = len(_args)
            for i, (k, v) in enumerate(sig.parameters.items()):
                if i < num_args:
                    named_args[k] = repr(_args[i])
                else:
                    named_args[k] = repr(v.default)
            t0 = time.time()
            _result = func(*_args)
            elapsed = time.time() - t0
            name = func.__name__
            result = repr(_result)
            args_dict = {
            **locals(),
            **named_args}
            del args_dict['_args']
            print(fmt.format(**args_dict))
            # store the information in some place
            return result
        return decorated 
    return decorate

What is important here are the following statements:

sig = inspect.signature(func)
named_args = {}
num_args = len(_args)
for i, (k, v) in enumerate(sig.parameters.items()):
    if i < num_args:
        named_args[k] = repr(_args[i])
    else:
        named_args[k] = repr(v.default)

We are iterating over the signature parameters and checking if it passed the value to func. If not, we extract the default value from the signature.

Using the @report decorator in the feed_monkeys we have this output:

NUM_MONKEYS = 20
@report('The zookeeper feeds the monkeys with {n_bananas} bananas and {n_apples} apples. Time to feed: {elapsed:0.4f}s')
def feed_monkeys(n_bananas, n_apples=0): 
    monkeys = {
        f"monkey_{i}": {"bananas": 0, "apples": 0}
        for i in range(NUM_MONKEYS)
    }
    while n_bananas > 0 and n_apples > 0:
        if np.random.uniform() < 0.4:
            continue
        monkey = monkeys[np.random.choice(list(monkeys.keys()))]

        if n_bananas > 0:
            monkey["bananas"] += 1
            n_bananas -= 1
        if n_apples > 0:
            monkey["apples"] += 1
            n_apples -= 1
        if n_apples == 0 and n_bananas == 0:
            break

for _ in range(3):
    feed_monkeys(np.random.randint(10, 100))
    feed_monkeys(np.random.randint(10, 100), 10)

The zookeeper feeds the monkeys with 69 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 92 bananas and 10 apples. Time to feed: 0.0011s
The zookeeper feeds the monkeys with 58 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 53 bananas and 10 apples. Time to feed: 0.0048s
The zookeeper feeds the monkeys with 42 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 51 bananas and 10 apples. Time to feed: 0.0025s

First issue solved! But our decorator is still not useful to the zookeeper and managers. We can’t know how good any monkey is doing or if there is any monkey that eats too much. You could already know that somehow we must have a way to access the monkeys' dictionary inside our def decorated method. Unfortunately, this is not a trivial task in python because it lacks namespaces decorators. But we also can overcome this with a little trick using a trace tool.

How to expose the locals() inside of a decorator?

Now we just need to access the local variables of the decorated method. Let’s think more deeply about this:

After the execution of the decorated method, all the information about the local variables is lost. Fortunately, we don’t want irrelevant information occupying our system memory.
The decorator will call the decorated method and will receive the return value. Thus, there is no way to extract the local variables because now there are no more local variables!

How to solve it? Well, think first about where the local variables have been stored before being erased.

Call stack and frames in python

If you came from a non-CS background, maybe you don’t know about an important concept called the call stack. A call stack is a data structure that stores information related to living things in our program.

If you call a function in python, a new block of information (frame) is pushed to the top of the call stack. After the function returns the value, this block of information is popped off the call stack. This comprehension can give insights into how to do things in python and how to create good or strange behaviors.

Well, you can think. If the elements of the call stack are always added on the top if a function (inner) is called by another function (outer) can I access the values of the local variables from the outer function inside of the inner? Yes, you can! Obviously, this is not always a good idea, but it’s good to understand this concept. Because this approach can be useful to deal with rigid frameworks like Django.

%%writefile test_stack.py
import inspect

N_BANANAS = 12

def outer_call(n_bananas):
    var_inside_outer_call = 2
    n_bananas += 1
    inner_call(n_bananas)

def inner_call(n_bananas):
    var_inside_inner_call = {"monkey": 0}
    frame_infos = inspect.stack()
    n_frames = len(frame_infos)
    frames_var_values = {
        f.function: [(k, v) for k, v in f.frame.f_locals.items()] for f in frame_infos
    }
    
    for i, (function, frame_local) in enumerate(frames_var_values.items()):
        print(f'\n\t {function} stack position: {n_frames - i}')
        for var_name, value in frame_local:
            print(f'\t\t Name: {var_name:25s}Type: {type(value)}')
            if var_name in ('n_bananas', 'N_BANANAS', 'var_inside_outer_call'):
                print(f'\t\t\t Value: {value}')

print("\n Before outer_call() call")
outer_call(N_BANANAS)

print("\n After outer_call() call")

frames = [
        [(k, v) for k, v in f.frame.f_locals.items()] 
        for f in inspect.stack()
]
for frame_local in frames:
    for var_name, value in frame_local:
        print(f'\t\t Name: {var_name:25s}Type: {type(value)}')

Overwriting test_stack.py

!python test_stack.py

 Before outer_call() call

	 inner_call stack position: 3
		 Name: n_bananas                Type: <class 'int'>
			 Value: 13
		 Name: var_inside_inner_call    Type: <class 'dict'>
		 Name: frame_infos              Type: <class 'list'>
		 Name: n_frames                 Type: <class 'int'>

	 outer_call stack position: 2
		 Name: n_bananas                Type: <class 'int'>
			 Value: 13
		 Name: var_inside_outer_call    Type: <class 'int'>
			 Value: 2

	 <module> stack position: 1
		 Name: __name__                 Type: <class 'str'>
		 Name: __doc__                  Type: <class 'NoneType'>
		 Name: __package__              Type: <class 'NoneType'>
		 Name: __loader__               Type: <class '_frozen_importlib_external.SourceFileLoader'>
		 Name: __spec__                 Type: <class 'NoneType'>
		 Name: __annotations__          Type: <class 'dict'>
		 Name: __builtins__             Type: <class 'module'>
		 Name: __file__                 Type: <class 'str'>
		 Name: __cached__               Type: <class 'NoneType'>
		 Name: inspect                  Type: <class 'module'>
		 Name: N_BANANAS                Type: <class 'int'>
			 Value: 12
		 Name: outer_call               Type: <class 'function'>
		 Name: inner_call               Type: <class 'function'>

 After outer_call() call
		 Name: __name__                 Type: <class 'str'>
		 Name: __doc__                  Type: <class 'NoneType'>
		 Name: __package__              Type: <class 'NoneType'>
		 Name: __loader__               Type: <class '_frozen_importlib_external.SourceFileLoader'>
		 Name: __spec__                 Type: <class 'NoneType'>
		 Name: __annotations__          Type: <class 'dict'>
		 Name: __builtins__             Type: <class 'module'>
		 Name: __file__                 Type: <class 'str'>
		 Name: __cached__               Type: <class 'NoneType'>
		 Name: inspect                  Type: <class 'module'>
		 Name: N_BANANAS                Type: <class 'int'>
		 Name: outer_call               Type: <class 'function'>
		 Name: inner_call               Type: <class 'function'>

First, draw your attention here

outer_call stack position: 2
		 Name: n_bananas                Type: <class 'int'>
			 Value: 13
		 Name: var_inside_outer_call    Type: <class 'int'>
			 Value: 2

Even if we don’t pass a variable as an argument to the inner_call function, this variable can be accessed because still lives in the call stack! As I’ve told you, after the execution of outer_call the call stack doesn’t have any information about what happened inside our functions. This discussion will help us to understand the limitations of our solution. Because our solution is just to watch the call stack and keep the frame before being popped off!

Using sys.trace to track our frames

Some time ago I’ve talked about how to dissect a process using lsof and strace: Dissecting processes and failures in Linux with lsof and strace. The strace is a tracing tool that intercepts and records in someplace any system call made by a process. Python has a built-in tool to do this kind of stuff. Thus, let’s use it to track our frames.

Let’s solve our problem

We will ask our code to monitor any call made with the decorated function. To do so, we will create a new function that will do this and release the trace after the execution of the decorated function.

import sys

def call_and_extract_frame(func, *args, **kwargs):
    frame_var = None
    trace = sys.gettrace()
    def update_frame_var(stack_frame, event_name, arg_frame):
        """
        Args:
            stack_frame: (frame)
                The current stack frame.
            event_name: (str)
                The name of the event that triggered the call. 
                Can be 'call', 'line', 'return' and 'exception'.
            arg_frame: 
                Depends on the event. Can be a None type
        """
        nonlocal frame_var # nonlocal is a keyword which allows us to change the variable in the outer scope
        if event_name != 'call':
            return trace
        frame_var = stack_frame
        sys.settrace(trace)
        return trace
    sys.settrace(update_frame_var)
    try:
        func_result = func(*args, **kwargs)
    finally:
        sys.settrace(trace)
    return frame_var, func_result

Now to use this trick, we just need to call the above function in our @report decorator. Like this:

def report(formater):
    def decorate(func):
        def decorated(*_args):
            sig = inspect.signature(func)
            named_args = {}
            num_args = len(_args)
            for i, (k, v) in enumerate(sig.parameters.items()):
                if i < num_args:
                    named_args[k] = repr(_args[i])
                else:
                    named_args[k] = repr(v.default)
            
            ### Our modifications
            frame_func, _result = call_and_extract_frame(func, *_args)
            name = func.__name__
            result = repr(_result)
            args_dict = {
                **named_args, 
                **locals(),
                **frame_func.f_locals,
            }
            ###

            print(formater.format(**args_dict))
            # do other stuff here
            return _result 
        return decorated
    return decorate

Let’s see the results:

@report(' Monkey {gluttonous_monkey} ate too much bananas.  Num monkeys {num_monkeys}')
def feed_monkeys(n_bananas): 
    num_monkeys = 3
    monkeys = {
        f"monkey_{i}": {"bananas": 0}
        for i in range(num_monkeys)
    }
    while n_bananas > 0:
        if np.random.uniform() < 0.4:
            continue
        monkey = monkeys[np.random.choice(list(monkeys.keys()))]
        if n_bananas > 0:
            monkey["bananas"] += 1
            n_bananas -= 1
    gluttonous_monkey = max(monkeys, key=lambda k: monkeys[k]["bananas"]) 

for _ in range(3):
    feed_monkeys(np.random.randint(10, 100))

 The monkey monkey_0 eat too much bananas.  Num monkeys 3
 The monkey monkey_1 eat too much bananas.  Num monkeys 3
 The monkey monkey_2 eat too much bananas.  Num monkeys 3

Conclusion and next steps

“…it depends”

Nice! It worked. But should you use it?

We have drawbacks in our approach:
- a tracing always creates a cost. Thus, is expected that we will reduce the performance of our system. If you use this just for debugging purposes, it’s ok.
- can have conflicts with other tools and libs that also trying to use the trace tool
- it seems dirty!

The next step: we don’t need a trace! We can do better using AST manipulation

Using the inspect module to get the argument names it’s ok but I’ve told you the trace tool can be problematic. But we can replace the trace with another approach. Although, it’s more conceptually complex don’t require dirty tricks and I believe it’s far more beautiful. The next post it’s about this!

SimpleNamespace for dict.key instead of dict[“key]

We have a minor issue and point of improvement. If you’re an cautious developer, probably you notice a flaw here

args_dict = {
    **named_args, 
    **locals(),
    **frame_func.f_locals,
}

if any of the dicts have common keys, one of them will overwrite the other. This is not what we want. You can use a simple solution like this:

args_dict = {
    "args": **named_args, 
    **locals(),
    "func_locals": **frame_func.f_locals,
}

But this is still annoying because we can do this with a format string:

@report(fmt="{args['n_bananas']} ...")

Well, how to solve it? Just use a SimpleNamespace to construct an object!

from types import SimpleNamespace


def report(formater):
    def decorate(func):
        def decorated(*_args):
            sig = inspect.signature(func)
            named_args = {}
            num_args = len(_args)
            for i, (k, v) in enumerate(sig.parameters.items()):
                if i < num_args:
                    named_args[k] = repr(_args[i])
                else:
                    named_args[k] = repr(v.default)
            
            ### Our modifications
            frame_func, _result = call_and_extract_frame(func, *_args)
            name = func.__name__
            result = repr(_result)
            args_dict = {
                "args": SimpleNamespace(**named_args), 
                "args_repr": repr(SimpleNamespace(**named_args)),
                **locals(),
                **frame_func.f_locals,
            }
            ###

            print(formater.format(**args_dict))
            # do other stuff here
            return _result 
        return decorated
    return decorate


@report(
    "".join((
        'The zookeeper feeds the monkeys with {args.n_bananas},', 
        'bananas. We loost {n_bananas} bananas. Args {args_repr}'
    ))
)
def feed_monkeys(n_bananas): 
    num_monkeys = 3
    monkeys = {
        f"monkey_{i}": {"bananas": 0}
        for i in range(num_monkeys)
    }
    while n_bananas > 0:
        if np.random.uniform() > .8:
            # "bananas rotted . Monkeys will not eat any banana any more")
            break
        if np.random.uniform() < 0.4:
            continue
        monkey = monkeys[np.random.choice(list(monkeys.keys()))]
        if n_bananas > 0:
            monkey["bananas"] += 1
            n_bananas -= 1
    gluttonous_monkey = max(monkeys, key=lambda k: monkeys[k]["bananas"]) 

for _ in range(3):
    feed_monkeys(np.random.randint(10, 100))

The zookeeper feeds the monkeys with 15,bananas. We loost 15 bananas. Args namespace(n_bananas='15')
The zookeeper feeds the monkeys with 80,bananas. We loost 77 bananas. Args namespace(n_bananas='80')
The zookeeper feeds the monkeys with 95,bananas. We loost 92 bananas. Args namespace(n_bananas='95')

Want to know more about call stack , inspect and trace?

Call stack and frames: Reza Bagheri explained here how to add a tail-call optimization in python using python stack frames.
Fluent Python book by Luciano Ramalho
Python documentation: tracebak, inspect and stack.
Stackoverflow discussion

python decorators inspect call stack debugging SimpleNamespace