Table of Contents
Fluent Python is the best resource to learn to use and love python. Some days ago I was reading a section of the chapter 7: “Function Decorators and Closures”. This chapter has a lot of interesting and cool examples. Here I’ll discuss one of them and how I tried to put more shiny stuff in it.
Gaining a deeper understanding about the execution context of a function
The Fluent Python Book example
Ramalho’s book presents us with a @clock
decorator that can be used to decorate a method, measure the time it takes to execute, and print in a human-readable format the arguments and name of the method. The example is shown below:
import time
DEFAULT_FMT = '[{elapsed:0.8f}s] {name}({args}) -> {result}'
def clock(fmt=DEFAULT_FMT):
def decorate(func):
def clocked(*_args):
t0 = time.time()
_result = func(*_args)
elapsed = time.time() - t0
name = func.__name__
args = ', '.join(repr(arg) for arg in _args)
result = repr(_result)
log_string = fmt.format(**locals())
# send to somewhere
# csv, ELK, etc
print(log_string)
return result
return clocked
return decorate
@clock('[{elapsed:0.8f}s] {name}({args})')
def snooze(seconds):
time.sleep(seconds)
return time.time()
for _ in range(3):
snooze(.123)
[0.12315798s] snooze(0.123)
[0.12315822s] snooze(0.123)
[0.12317085s] snooze(0.123)
If you don’t understand something in the above code I recommend that you take some time searching and reading about each aspect. There are many cool things being used there, for example:
repr
which is a function that returns a string representation of an object.- This is essential because the
DEFAULT_FMT
is a string, not af-string
, we can’t just put a generic object to be printed inDEFAULT_FMT
.
- This is essential because the
log_string = fmt.format(**locals())
: instead of creating a repetitive code likefmt.format(**{"result":result, "args":args, ...})
we can just use thelocals()
which is a dictionary that contains all the local variables of the current scope.
When I study something I always like to create a fresh problem with the stuff that I’ve learned and try to solve it. Sometimes there is no solution. But even if there is no solution, we still learn other stuff.
I’ve started by creating the following example:
import numpy as np
@clock('[{elapsed:0.8f}s] {name}({args})')
def snooze_and_snore(seconds, snore_loud, min_prob_to_snore=0.4):
time.sleep(seconds)
to_snore = np.random.uniform() > min_prob_to_snore
if to_snore:
if snore_loud:
pass
# r.requets(wake_up_everyone)
pass
return time.time()
for _ in range(3):
snooze_and_snore(.4, True, .1)
snooze_and_snore(.4, False, .1)
[0.40229130s] snooze_and_snore(0.4, True, 0.1)
[0.40049720s] snooze_and_snore(0.4, False, 0.1)
[0.40058565s] snooze_and_snore(0.4, True, 0.1)
[0.40013075s] snooze_and_snore(0.4, False, 0.1)
[0.40052223s] snooze_and_snore(0.4, True, 0.1)
[0.40057564s] snooze_and_snore(0.4, False, 0.1)
Ok, what are the problems/issues/limitations that the above code showed me?
Current issues and limitations
- We don’t have information about the names of the arguments passed to the method.
- If the list of arguments is long, trying to understand what is happening becomes a hard task. Because we are increasing the amount of stuff that we must keep in our mind. We are increasing the cognitive load in the terms presented in the excelsior book: A Philosophy of Software Design.
- A person who is not familiar with the codebase cannot understand what is happening by analyzing the outputs of the decorator. If these outputs are being stored in the ELK stack, this will be unproductive.
- We have the
locals()
information from the decorator which is fed by the result of the decorated method. However, we can’t get any information about thelocals()
of the decorated method. Why is this bad?- The final internal state of the method is commonly used to understand the execution of a method.
- Sometimes a method depends on random variables defined in the local context. Thus, the same set of arguments can give different executions. Until now, we don’t have a way to get the
locals()
of the decorated method. For example, in thesnooze_and_snore
we can’t know if the person snored or not.
We will attack the first issue using the inspect module. As I’ll show you, we can do cool things with this module.
sys.trace
, call stack
and inspect.signatures
I recommend
you go directly to the section Let’s solve our problem
Creating an introspective code with the inspect module
The inspect module is a Python standard library that provides several tools to help you to introspect and consequently learn about live objects like functions, modules, classes, instances, frame objects (I’ll talk about frames later in this post), etc. Well, what can you do with this? Really, a lot of things. You can use it to automatically create documentation, parse the docstrings, manipulate the AST, etc.
A decorator that validates arguments using mathematical notation
In the last years, we have seen the development of the typing
module and the mypy
static analysis tool for python. This module and tool can be very useful sometimes. However, it doesn’t provide some features that are essential for proper validation. But at least in my experience creating code for my Ph.D., I usually don’t need so much sophisticated type theory and validation to be able to write a good code for a mathematical modeling tool. Most of the mathematical validation that I need is just checking if an argument still satisfies some constraints or lives in a proper subspace. If not, I need to raise an exception or perform some kind of regularization.
Let’s create a decorator that will validate arguments using simple mathematical notation.
We will create a dictionary that will contain the annotation as a key and the value will be a human-readable description of the annotation and a method responsible for check if everything is right.
import inspect
MATH_SPACES = {
"\in R": {"name" : "real space", "validator": lambda x: isinstance(x, (int, float))},
"\in R_+": {"name": "space of real numbers greater than zero", "validator": lambda x: isinstance(x, (int, float)) and x > 0},
}
We will use the inspect.signature
to get the annotations of each argument of the decorated method.
For example, if the decorated method is def foo(a: '\in R', b)
the inspect.signature(foo)
will return an object which we can use to extract an ordered dictionary with the arguments and the annotations. Like this
def foo(a: "\in R", b, c:int, d= 2):
pass
for k, v in inspect.signature(foo).parameters.items():
print(k, v, type(v._annotation), v.default)
a a: '\\in R' <class 'str'> <class 'inspect._empty'>
b b <class 'type'> <class 'inspect._empty'>
c c: int <class 'type'> <class 'inspect._empty'>
d d=2 <class 'type'> 2
Let’s create our decorator. It should be really simple. Just check if we should verify the argument and if so, check if the value respects the annotated mathematical space.
def math_validator():
def decorate(func):
def decorated(*_args):
sig = inspect.signature(func)
# sig parameters is an ordered dict
for i, (k, v) in enumerate(sig.parameters.items()):
annotation = v._annotation
if not isinstance(annotation, str):
continue
if not annotation in MATH_SPACES:
print(f"{annotation} is not implemented in Math Spaces")
continue # skip if we didn't implement this space validation
if not MATH_SPACES[annotation]["validator"](_args[i]):
raise ValueError(f"{k} doesn't belong to the {MATH_SPACES[annotation]['name']}")
result = func(*_args)
print(f"{func.__name__}({_args}) -> {result}")
return result
return decorated
return decorate
@math_validator()
def simple_method(x: "\in R", y: "\in R_+", z: float = 2) -> float:
"""Simple method to add two numbers together and
divide by the last number
Args:
x: The first number to add.
y: The second number to add.
z: it is a float number that will be the power of the result.
This will not be checked for math spaces.
Returns:
float: result
"""
result = (x+y)/y
return result**z
simple_method(1, 2)
simple_method(1, 0)
simple_method((1, 2)) -> 1.5
---> 19 simple_method(1, 0)
...
<locals>.decorate.<locals>.decorated(*_args)
11 continue
13 if not MATH_SPACES[annotation]["validator"](_args[i]):
---> 14 raise ValueError(f"{k} doesn't belong to the {MATH_SPACES[annotation]['name']}")
15 result = func(*_args)
16 print(f"{func.__name__}({_args}) -> {result}")
ValueError: y doesn't belong to the space of real numbers greater than zero
Our decorator is quite simple but does the job. You can go deeper into this and use a more sophisticated mathematical notation, printing using latex, etc. But now, let’s go back to the Python Fluent example because the inspect.signature
already provides us with a way to solve the first limitation!
Going back to the Fluent python example
Let’s remember one thing that I’ve pointed out:
A person who is not familiar with the code base will not be able to understand what is happening just by analyzing the outputs of the decorator.
It’s obvious that we can overcome this issue by using the inspect
module. Let’s create a more elaborated example using monkeys and a zookeeper that must record and report the information about how the life of the monkeys are going.
NUM_MONKEYS = 20
def feed_monkeys(n_bananas, n_apples=0):
monkeys = {
f"monkey_{i}": {"bananas": 0, "apples": 0}
for i in range(NUM_MONKEYS)
}
while n_bananas > 0 and n_apples > 0:
if np.random.uniform() < 0.4:
continue
monkey = monkey[np.random.choice(list(monkeys.keys()))]
if n_bananas > 0:
monkey["bananas"] += 1
n_bananas -= 1
if n_apples > 0:
monkey["apples"] += 1
n_apples -= 1
if n_apples == 0 and n_bananas == 0:
break
My solution is the @report
decorator presented below.
def report(fmt=DEFAULT_FMT):
def decorate(func):
def decorated(*_args):
sig = inspect.signature(func)
named_args = {}
num_args = len(_args)
for i, (k, v) in enumerate(sig.parameters.items()):
if i < num_args:
named_args[k] = repr(_args[i])
else:
named_args[k] = repr(v.default)
t0 = time.time()
_result = func(*_args)
elapsed = time.time() - t0
name = func.__name__
result = repr(_result)
args_dict = {
**locals(),
**named_args}
del args_dict['_args']
print(fmt.format(**args_dict))
# store the information in some place
return result
return decorated
return decorate
What is important here are the following statements:
sig = inspect.signature(func)
named_args = {}
num_args = len(_args)
for i, (k, v) in enumerate(sig.parameters.items()):
if i < num_args:
named_args[k] = repr(_args[i])
else:
named_args[k] = repr(v.default)
We are iterating over the signature parameters and checking if it passed the value to func
. If not, we extract the default value from the signature.
Using the @report
decorator in the feed_monkeys
we have this output:
NUM_MONKEYS = 20
@report('The zookeeper feeds the monkeys with {n_bananas} bananas and {n_apples} apples. Time to feed: {elapsed:0.4f}s')
def feed_monkeys(n_bananas, n_apples=0):
monkeys = {
f"monkey_{i}": {"bananas": 0, "apples": 0}
for i in range(NUM_MONKEYS)
}
while n_bananas > 0 and n_apples > 0:
if np.random.uniform() < 0.4:
continue
monkey = monkeys[np.random.choice(list(monkeys.keys()))]
if n_bananas > 0:
monkey["bananas"] += 1
n_bananas -= 1
if n_apples > 0:
monkey["apples"] += 1
n_apples -= 1
if n_apples == 0 and n_bananas == 0:
break
for _ in range(3):
feed_monkeys(np.random.randint(10, 100))
feed_monkeys(np.random.randint(10, 100), 10)
The zookeeper feeds the monkeys with 69 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 92 bananas and 10 apples. Time to feed: 0.0011s
The zookeeper feeds the monkeys with 58 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 53 bananas and 10 apples. Time to feed: 0.0048s
The zookeeper feeds the monkeys with 42 bananas and 0 apples. Time to feed: 0.0000s
The zookeeper feeds the monkeys with 51 bananas and 10 apples. Time to feed: 0.0025s
First issue solved! But our decorator is still not useful to the zookeeper and managers. We can’t know how good any monkey is doing or if there is any monkey that eats too much. You could already know that somehow we must have a way to access the monkeys' dictionary inside our def decorated
method. Unfortunately, this is not a trivial task in python because it lacks namespaces decorators. But we also can overcome this with a little trick using a trace tool.
How to expose the locals() inside of a decorator?
Now we just need to access the local variables of the decorated method. Let’s think more deeply about this:
- After the execution of the decorated method, all the information about the local variables is lost. Fortunately, we don’t want irrelevant information occupying our system memory.
- The decorator will call the decorated method and will receive the return value. Thus, there is no way to extract the local variables because now there are no more local variables!
How to solve it? Well, think first about where the local variables have been stored before being erased.
Call stack and frames in python
If you came from a non-CS background, maybe you don’t know about an important concept called the call stack. A call stack is a data structure that stores information related to living things in our program.
If you call a function in python, a new block of information (frame) is pushed to the top of the call stack. After the function returns the value, this block of information is popped off the call stack. This comprehension can give insights into how to do things in python and how to create good or strange behaviors.
Well, you can think. If the elements of the call stack are always added on the top if a function (inner) is called by another function (outer) can I access the values of the local variables from the outer function inside of the inner? Yes, you can! Obviously, this is not always a good idea, but it’s good to understand this concept. Because this approach can be useful to deal with rigid frameworks like Django.
%%writefile test_stack.py
import inspect
N_BANANAS = 12
def outer_call(n_bananas):
var_inside_outer_call = 2
n_bananas += 1
inner_call(n_bananas)
def inner_call(n_bananas):
var_inside_inner_call = {"monkey": 0}
frame_infos = inspect.stack()
n_frames = len(frame_infos)
frames_var_values = {
f.function: [(k, v) for k, v in f.frame.f_locals.items()] for f in frame_infos
}
for i, (function, frame_local) in enumerate(frames_var_values.items()):
print(f'\n\t {function} stack position: {n_frames - i}')
for var_name, value in frame_local:
print(f'\t\t Name: {var_name:25s}Type: {type(value)}')
if var_name in ('n_bananas', 'N_BANANAS', 'var_inside_outer_call'):
print(f'\t\t\t Value: {value}')
print("\n Before outer_call() call")
outer_call(N_BANANAS)
print("\n After outer_call() call")
frames = [
[(k, v) for k, v in f.frame.f_locals.items()]
for f in inspect.stack()
]
for frame_local in frames:
for var_name, value in frame_local:
print(f'\t\t Name: {var_name:25s}Type: {type(value)}')
Overwriting test_stack.py
!python test_stack.py
Before outer_call() call
inner_call stack position: 3
Name: n_bananas Type: <class 'int'>
Value: 13
Name: var_inside_inner_call Type: <class 'dict'>
Name: frame_infos Type: <class 'list'>
Name: n_frames Type: <class 'int'>
outer_call stack position: 2
Name: n_bananas Type: <class 'int'>
Value: 13
Name: var_inside_outer_call Type: <class 'int'>
Value: 2
<module> stack position: 1
Name: __name__ Type: <class 'str'>
Name: __doc__ Type: <class 'NoneType'>
Name: __package__ Type: <class 'NoneType'>
Name: __loader__ Type: <class '_frozen_importlib_external.SourceFileLoader'>
Name: __spec__ Type: <class 'NoneType'>
Name: __annotations__ Type: <class 'dict'>
Name: __builtins__ Type: <class 'module'>
Name: __file__ Type: <class 'str'>
Name: __cached__ Type: <class 'NoneType'>
Name: inspect Type: <class 'module'>
Name: N_BANANAS Type: <class 'int'>
Value: 12
Name: outer_call Type: <class 'function'>
Name: inner_call Type: <class 'function'>
After outer_call() call
Name: __name__ Type: <class 'str'>
Name: __doc__ Type: <class 'NoneType'>
Name: __package__ Type: <class 'NoneType'>
Name: __loader__ Type: <class '_frozen_importlib_external.SourceFileLoader'>
Name: __spec__ Type: <class 'NoneType'>
Name: __annotations__ Type: <class 'dict'>
Name: __builtins__ Type: <class 'module'>
Name: __file__ Type: <class 'str'>
Name: __cached__ Type: <class 'NoneType'>
Name: inspect Type: <class 'module'>
Name: N_BANANAS Type: <class 'int'>
Name: outer_call Type: <class 'function'>
Name: inner_call Type: <class 'function'>
First, draw your attention here
outer_call stack position: 2
Name: n_bananas Type: <class 'int'>
Value: 13
Name: var_inside_outer_call Type: <class 'int'>
Value: 2
Even if we don’t pass a variable as an argument to the inner_call
function, this variable can be accessed because still lives in the call stack! As I’ve told you, after the execution of outer_call
the call stack doesn’t have any information about what happened inside our functions. This discussion will help us to understand the limitations of our solution. Because our solution is just to watch the call stack and keep the frame before being popped off!
Using sys.trace to track our frames
Some time ago I’ve talked about how to dissect a process using lsof
and strace
:
Dissecting processes and failures in Linux with lsof and strace. The strace
is a tracing tool that intercepts and records in someplace any system call made by a process. Python has a built-in tool to do this kind of stuff. Thus, let’s use it to track our frames.
Let’s solve our problem
We will ask our code to monitor any call made with the decorated function. To do so, we will create a new function that will do this and release the trace after the execution of the decorated function.
import sys
def call_and_extract_frame(func, *args, **kwargs):
frame_var = None
trace = sys.gettrace()
def update_frame_var(stack_frame, event_name, arg_frame):
"""
Args:
stack_frame: (frame)
The current stack frame.
event_name: (str)
The name of the event that triggered the call.
Can be 'call', 'line', 'return' and 'exception'.
arg_frame:
Depends on the event. Can be a None type
"""
nonlocal frame_var # nonlocal is a keyword which allows us to change the variable in the outer scope
if event_name != 'call':
return trace
frame_var = stack_frame
sys.settrace(trace)
return trace
sys.settrace(update_frame_var)
try:
func_result = func(*args, **kwargs)
finally:
sys.settrace(trace)
return frame_var, func_result
Now to use this trick, we just need to call the above function in our @report
decorator. Like this:
def report(formater):
def decorate(func):
def decorated(*_args):
sig = inspect.signature(func)
named_args = {}
num_args = len(_args)
for i, (k, v) in enumerate(sig.parameters.items()):
if i < num_args:
named_args[k] = repr(_args[i])
else:
named_args[k] = repr(v.default)
### Our modifications
frame_func, _result = call_and_extract_frame(func, *_args)
name = func.__name__
result = repr(_result)
args_dict = {
**named_args,
**locals(),
**frame_func.f_locals,
}
###
print(formater.format(**args_dict))
# do other stuff here
return _result
return decorated
return decorate
Let’s see the results:
@report(' Monkey {gluttonous_monkey} ate too much bananas. Num monkeys {num_monkeys}')
def feed_monkeys(n_bananas):
num_monkeys = 3
monkeys = {
f"monkey_{i}": {"bananas": 0}
for i in range(num_monkeys)
}
while n_bananas > 0:
if np.random.uniform() < 0.4:
continue
monkey = monkeys[np.random.choice(list(monkeys.keys()))]
if n_bananas > 0:
monkey["bananas"] += 1
n_bananas -= 1
gluttonous_monkey = max(monkeys, key=lambda k: monkeys[k]["bananas"])
for _ in range(3):
feed_monkeys(np.random.randint(10, 100))
The monkey monkey_0 eat too much bananas. Num monkeys 3
The monkey monkey_1 eat too much bananas. Num monkeys 3
The monkey monkey_2 eat too much bananas. Num monkeys 3
Conclusion and next steps
“…it depends”
Nice! It worked. But should you use it?
- We have drawbacks in our approach:
- a tracing always creates a cost. Thus, is expected that we will reduce the performance of our system. If you use this just for debugging purposes, it’s ok.
- can have conflicts with other tools and libs that also trying to use the trace tool
- it seems dirty!
The next step: we don’t need a trace! We can do better using AST manipulation
- Using the inspect module to get the argument names it’s ok but I’ve told you the trace tool can be problematic. But we can replace the trace with another approach. Although, it’s more conceptually complex don’t require dirty tricks and I believe it’s far more beautiful. The next post it’s about this!
SimpleNamespace for dict.key instead of dict[“key]
We have a minor issue and point of improvement. If you’re an cautious developer, probably you notice a flaw here
args_dict = {
**named_args,
**locals(),
**frame_func.f_locals,
}
if any of the dicts have common keys, one of them will overwrite the other. This is not what we want. You can use a simple solution like this:
args_dict = {
"args": **named_args,
**locals(),
"func_locals": **frame_func.f_locals,
}
But this is still annoying because we can do this with a format string:
@report(fmt="{args['n_bananas']} ...")
Well, how to solve it? Just use a SimpleNamespace to construct an object!
from types import SimpleNamespace
def report(formater):
def decorate(func):
def decorated(*_args):
sig = inspect.signature(func)
named_args = {}
num_args = len(_args)
for i, (k, v) in enumerate(sig.parameters.items()):
if i < num_args:
named_args[k] = repr(_args[i])
else:
named_args[k] = repr(v.default)
### Our modifications
frame_func, _result = call_and_extract_frame(func, *_args)
name = func.__name__
result = repr(_result)
args_dict = {
"args": SimpleNamespace(**named_args),
"args_repr": repr(SimpleNamespace(**named_args)),
**locals(),
**frame_func.f_locals,
}
###
print(formater.format(**args_dict))
# do other stuff here
return _result
return decorated
return decorate
@report(
"".join((
'The zookeeper feeds the monkeys with {args.n_bananas},',
'bananas. We loost {n_bananas} bananas. Args {args_repr}'
))
)
def feed_monkeys(n_bananas):
num_monkeys = 3
monkeys = {
f"monkey_{i}": {"bananas": 0}
for i in range(num_monkeys)
}
while n_bananas > 0:
if np.random.uniform() > .8:
# "bananas rotted . Monkeys will not eat any banana any more")
break
if np.random.uniform() < 0.4:
continue
monkey = monkeys[np.random.choice(list(monkeys.keys()))]
if n_bananas > 0:
monkey["bananas"] += 1
n_bananas -= 1
gluttonous_monkey = max(monkeys, key=lambda k: monkeys[k]["bananas"])
for _ in range(3):
feed_monkeys(np.random.randint(10, 100))
The zookeeper feeds the monkeys with 15,bananas. We loost 15 bananas. Args namespace(n_bananas='15')
The zookeeper feeds the monkeys with 80,bananas. We loost 77 bananas. Args namespace(n_bananas='80')
The zookeeper feeds the monkeys with 95,bananas. We loost 92 bananas. Args namespace(n_bananas='95')
Want to know more about call stack , inspect and trace?
- Call stack and frames: Reza Bagheri explained here how to add a tail-call optimization in python using python stack frames.
- Fluent Python book by Luciano Ramalho
- Python documentation: tracebak, inspect and stack.
- Stackoverflow discussion