Pitfalls of Callback-Based APIs

Callbacks are a great tool for abstraction and modularity. They are truly the refactorer's best-friend. You want multiple logging backends? No problem! Dynamically handle clicks without modifying your GUI Button implementation? You got it! Universally respond to database updates without hard coding all mutation sites? You got it again! It almost seems like God himself must have invented callbacks.

Then everything goes straight to hell. Suddenly, after a seemingly unrelated change, your code to starts to loop infinitely, or there's some sort of runtime exception while iterating over a shared data structure, or mutexes start to deadlock. You fix it but then next week another seemingly unrelated change causes the same bout of issues. Now repeat indefinitely for the lifetime of your project.

This article is about API design. Specifically, this article is about the tendency to chronically misuse APIs that invoke user-supplied callbacks.

If this sounds familiar to you then you've probably come up with your own set of ad-hoc guidelines for the correct way to implement callback-based APIs. For example, "Invoke callbacks at the end of a function" or "Use deferred callbacks when manipulating shared resources." My goal is that by the end of this article you understand the reasons why these guidelines surface independently, how they are all related, and why callback-based APIs are so error-prone in the first place.

An Example

Let's elaborate on one of the examples mentioned above to more concretely illustrate the problem. Let's say you have implemented an extensible logging API for your application:

loggers = []
def add_logger(logger):
    loggers.append(logger)

def log(message):
    for l in loggers:
        l(message)

This is nice. Now you can use multiple backends for your logger without changing the code that does the logging:

def console_logger(message):
    print message

add_logger(console_logger)

Your application also has a little database implementation:

database_handle = None
def get_database_handle():
    global database_handle

    if database_handle is None:
        log("Opening database")
        database_handle = open_database()

    return database_handle

During the course of developing your application you realize that log messages are not just useful for developing but also debugging errors that users run into out in the field. You decide to create a database logger to save logs to better help you debug.

def database_logger(message):
    db = get_database_handle()
    db.save_document({"type" : "log", "message": message})

add_logger(database_logger)

This is awesome! You barely had to write any code to implement this functionality and now your QA is going to improve drastically.

When you fire up your application you find it no longer starts up and now it seems to use 100% CPU.

What's happening is that the log() call in get_database_handle() is causing an infinite loop. Here's one possible stack_trace:

log() -> database_logger() -> get_database_handle() -> log()

That's kind of annoying but also kind of scary too. Adding a logger should definitely not cause an infinite loop in your application. You briefly ponder how many other infinite loops might be lurking deep within your application.

You think for a while and then come up with this fix:

is_logging = False
def log(message):
    global is_logging
    if is_logging: return
    is_logging = True
    try:
        for l in loggers:
            l(message)
    finally:
        is_logging = False

Sure, you may miss some logs but at least you eliminate the possibility of another infinite loop due to logging in your database code.

Time passes and your application becomes multi-threaded. In particular, your database handle can now be used in multiple threads. Even though the handle itself is thread-safe, you need to change the singleton initialization in get_database_handle() to handle the case where two threads access it at the same time:

database_handle = None
database_handle_lock = threading.Lock()
def get_database_handle():
    global database_handle

    with database_handle_lock:
        if database_handle is None:
            log("Opening database")
            database_handle = open_database()

        return database_handle

Now you won't accidentally open two database handles. When you fire up your application, to your surprise it now hangs and it's using no CPU.

What's happening this time is that the log() call is causing you to acquire a lock twice:

get_database_handle() -> log() -> database_logger() -> get_database_handle()

You also notice that database_handle was actually always getting initialized twice when get_database_handle() was called at the top of the call stack before log() was.

Something seems horribly wrong here. Your simple database logger just shouldn't be causing all of these errors! You have a lot of ways to solve this problem.

Remove the log() call entirely from get_database_handle() and just require that loggers don't call log().
Make database_handle_lock a reentrant lock, i.e. threading.RLock, and check if database_handle was initialized after the call to log() in get_database_handle().
Make database_logger() use a separate database_handle. This way it doesn't call get_database_handle().

Even just enumerating those possible solutions took a lot of thinking. What was once simple has now spiraled into a complicated mess! You even start to question the legitimacy of your earlier fix to log().

What's the right thing to do here? Is there a solution that will prevent all further issues with our logging system?

What's Really Going on Here

In the example above, it's disturbing that simple and intuitively harmless changes introduced such drastic errors. Experientially, it's kind of like executing print 2 + 2 and seeing 5. What was it about that logging API that turned us into such bad programmers?

The specification of a typical callback-based API is simple: pass in any function and it will get called when the relevant event happens. Unfortunately, this simple specification doesn't reflect reality. You can't just call any function at any time and expect your program to be correct. The context in which a function is called matters. The context information required for correctness isn't usually part of a callback-based API's formal specification.

An important implication of not including this context information as part of the API's specification is that when an error occurs, it's not clear if the caller or callee is at fault. This can lead to inelegant and/or incomplete solutions.

Another characteristic of callback-based APIs is that natural evolution of the encapsulating system (due to refactoring and enhancement) causes the context in which callbacks are invoked to change. This in turn causes the specification of the API to change, even if the API's apparent interface or internal implementation doesn't change. Callback-based APIs are inherently unstable as long as their specification is dependent on external conditions that are subject to change.

A Documentation Problem?

Now, you may say this is a documentation problem. "Just document how and where the callback is called with the API." With good documentation, using a callback API incorrectly isn't really much different from using any API incorrectly. Ultimately this is programming error, right?

I would argue that robust systems must be resistant to programming error despite the existence of good documentation. At the very least a robust system should be able to fail-fast when a programming error occurs, lest you risk data corruption and/or security vulnerabilities on your users.

Unfortunately, with callback-based APIs, we don't even have the ability to detect and fail-fast on this kind of programming error. We have type-safe systems, and we have the ability to perform runtime assertions on strings and integers but the available mainstream tools don't provide the ability to perform any sort of high-level assertions on functions themselves.

Additionally, we have seen that the specifications for callback-based APIs change due to their dependence on external conditions. Even if a user correctly follows the documentation, a change to an unrelated component in the system may cause that usage to retroactively become incorrect. Again, there is no automatic safeguard against this.

Without the ability to assert required properties on argument callbacks, and also because the required properties are constantly changing due to external conditions, callback-based APIs encourage the introduction of programming errors into your application.

An Ideal Solution

What we want is a way to take the specifications expressed in the documentation and turn them into assertions that our compiler or runtime system can check against. This would allow our API to fail-fast when a user adds an incorrect callback or when the structure of the system changes significantly.

For the example above, sample documentation would be "Do not add a callback that calls log() or acquires any locks held while log() is called." In this case we'd have a magic function that could check that for us:

def add_logger(logger):
    assert takes_no_held_locks_and_does_not_call_log(logger)
    loggers.append(logger)

Still this is not enough. That assertion is static. As I've shown before, in the future we may add other dangerous contexts in which log() will be called.

To truly robustly fail-fast on invalid callbacks, we'd need a magic function that automatically knew all the possible contexts in which the callback would be invoked and could check whether or not the callback is safe to invoke in those contexts.

With this ability refactoring could be done in total confidence and with much less effort.

Are magic functions like these possible to implement? I don't believe it's in general possible to prove arbitrary properties about arbitrary program code, though I suspect there are many specific useful cases where it is possible. Unfortunately, I am not familiar with integrated code analysis tools that could help with this.

A More Practical Solution

We've seen that what makes callbacks dangerous is the unforeseen contexts under which they are invoked. Instead of having to assert properties to ensure correctness, what if we eliminated the need to assert anything? We can do this by eliminating all context when invoking a callback:

callbacks = queue()

def queue_callback(cb):
    callbacks.push(cb)

def main_callback():
    print "hello world!"

def main():
    queue_callback(main_callback)
    while callbacks:
        cb = callbacks.pop()
        cb()
    return 0

In this code sample we don't invoke callbacks inline. We add them to a queue of callbacks and only invoke them when we aren't in the middle of any running context. You might recognize this as a "deferred" or "asynchronous" callback.

This works but this method doesn't apply in all cases. Sometimes you need to invoke callbacks synchronously with their callers.

Another drawback with this method is that not all programs are structured in this way (i.e. one big event loop) and it's a significant design choice to choose this structure for your program.

A Pure Solution

Instead of eliminating all context in which a callback is invoked, you can go the opposite route and instead only allow callbacks that have no impact on any potential context.

Functions like this are called "pure" functions and they are well known in the functional programming world. From Wikipedia:

The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.

Pure functions have no impact on any runtime state and that makes them always safe to invoke as a callback. The definition of purity is relatively simple so it's neither difficult to document requiring them nor is it difficult to manually verify a function's purity.

Again, this works but this method doesn't apply in all cases. Sometimes you need to invoke impure callbacks. Also, without an automatic way to verify the purity property of a function, this method is not as robust as the event-loop solution.

The Truth

This article isn't really about callbacks and it isn't even really about APIs. Before you react, hear me out. This article is about building robust systems. The truth is that, even without callbacks, the systems we build are utterly susceptible to programming error.

This is a serious issue. Too many of the security issues in the last decade were due solely to programmer error. To this day we literally run large mission critical systems with zero to little proof of their correctness.

This is a solvable problem and I think better code analysis tools are the solution. We need to think beyond proving type safety and memory safety. We may not be able to always prove whether our programs halt but I'd guess that there are a bunch of useful domain-specific properties (including callback-correctness) that we can prove.

I believe readily available and user-extensible code analysis tools will not only drastically improve our software quality but will fundamentally change how we write software and what we expect from software.

Conclusion

I hope you enjoyed reading this article. Most importantly, I hope I was able to get you to second-guess how you use callback-based APIs and your approach to building robust systems. Nothing in this article is science so I may very well be wrong about everything. Please let me know @timeserena, on Hacker News, or in the comments below!

Rian Hunter, 12/23/14

Thanks to Aston Motes, Brian Smith, Will Stockwell, Kannan Goundan and Albert Ni for feedback on an earlier draft of this article.