Global singleton objects vs class instances for libraries

Introduction

Lets assume we are working with an HTTP API SDK. Here is the first snippet from Stripe’s documentation:

import stripe
stripe.api_key = "sk_test_..."

# list customers
customers = stripe.Customer.list()

I would say this is pretty straightforward. There is a global object you import, you edit its configuration for authentication purposes and then you are able to access its attributes in order to interact with the API.

Lets try one from PyGithub:

from github import Github

# using username and password
g = Github("user", "password")

# or using an access token
g = Github("access_token")

# Github Enterprise with custom hostname
g = Github(base_url="https://{hostname}/api/v3", login_or_token="access_token")

for repo in g.get_user().get_repos():
    # ...

So, now it seems we are creating custom instances that represent connections to the GitHub API, which we configure during initialization. Again, pretty straightforward, but why are there 2 approaches? Which is better?

My solution

I have struggled with this dilemma in the past and my final choice is… (drumroll) … both.

Here’s how it works:

# package/core.py

class Foo:
    def __init__(self, **kwargs):
        self.host = "https://api.foo.com"
        self.username = None
        self.password = None
        self.api_token = None

        self.setup(**kwargs)

    def setup(self, host=None, username=None, password=None, api_token=None):
        if host is not None:
            self.host = host
        if username is not None:
            self.username = username
        if password is not None:
            self.password = password
        if api_token is not None:
            self.api_token = api_token

We set default values in __init__, override them in setup and we make __init__ conclude by delegating to setup.

What this does is make the following 2 snippets identical:

kwargs = ...
f = Foo(**kwargs)

kwargs = ...
f = Foo()
f.setup(**kwargs)

So lets assume we are writing an SDK for the foo.com service. We will do:

# foo/core.py
class Foo:
    # ...

# foo/__init__.py
from .core import Foo
foo = Foo()

Now, someone who wants to use the foo SDK can either:

Use the global object:

# my_app.py
from foo import foo
foo.setup(username="...", password="...")
foo.do_something()

Or create a custom one (or more):

# my_app.py
from foo import Foo
foo1 = Foo(username="...", password="...")
foo2 = Foo(api_token="...")
foo1.do_something()
foo2.do_something()

Objections

How do we ensure that an object is properly setup? Previously we could raise an exception if the user didn’t supply enough arguments during initialization. Now we can’t.

It’s true. Since __init__ must work even without any arguments, it’s possible to make an instance that is not properly configured. My first answer to this is: “Python is an easier-to-ask-for-forgiveness-than-permission language”. Or to put it in other words: “Hey buddy, I told you how to initialize the object in the docs; it’s not my fault you didn’t supply all the arguments”.

So, yeah, do nothing. Eventually the missing parameters will lead to an error. That’s fine.

However, you might not be entirely satisfied with this approach. There are ways to make things better that don’t sacrifice a lot of the elegance:

class Foo:
    # __init__, setup, ...

    @property
    def configured(self):
        return ((self.username is not None and self.password is not None) or
                self.api_token is not None)

In this example, the foo object is considered “fully configured” if it either has a username/password pair or an API token. The configured property lets the user protect against using the object when it’s not configured properly.

You can add to this:

class Foo:
    # __init__, setup, configured

    def require_configured(self):
        if not self.configured:
            raise ValueError("Foo object is not properly configured")

    def do_something(self, ...):
        self.require_configured()
        # ...

The exception will be raised when you attempt to use the foo object, not during initialization. But you can add a call to require_configured after your initialization to get what you want. It only takes one extra line of code:

from foo import foo
foo.setup(...)
foo.require_configured()  # <--
foo.do_something(...)

What if the parameters have more complex types than simple strings? What if they are instances of classes that also need to be configured?

To explain what this question is about, lets describe it as an example: So, lets say that the foo.com service serves a lot of data and that we want our SDK to cache that data so that it doesn’t retrieve it every time. We also want to provide a few cache implementations. Finally, lets assume that all implementations receive the same parameter. We want to allow users to choose a cache implementation to their liking and to be able to configure it with that parameter. We also want users to be able to pass that parameter to the base foo object’s configuration to make things simple.

In short, we want all of these to be possible:

from foo import foo
from foo.cache import MemoryCache, DiskCache

foo.setup(cache=MemoryCache())
foo.setup(cache=DiskCache())
foo.setup(cache=MemoryCache(ttl=30))
foo.setup(cache=DiskCache(ttl=30))
foo.setup(cache_ttl=30)
foo.setup(cache=MemoryCache(), cache_ttl=30)
foo.setup(cache=DiskCache(), cache_ttl=30)

The catch here is that we want the TTL option to be available either as a kwarg to foo’s setup method or as a kwarg to the cache’s initializer. We may want this because it’s possible that most users will use the default option for the cache implementation and may want to simply change the TTL value.

How could we go about this? My answer would be:

Implement the cache classes like the Foo class:

# foo/cache.py
class CacheBase:
    def __init__(self, **kwargs):
        self.ttl = 10
        self.setup(**kwargs)

    def setup(self, ttl=None):
        if ttl is not None:
            self.ttl = ttl

class MemoryCache(CacheBase):
    # get, set, ...

class DiskCache(CacheBase):
    # get, set, ...

At the end of Foo.setup, delegate to cache.setup:

# foo/core.py
from .cache import MemoryCache
class Foo:
    def __init__(self, **kwargs):
        self.cache = MemoryCache()  # Default value
        self.setup(**kwargs)

    def setup(self, cache=None, cache_ttl=None):
        if cache is not None:
            self.cache = cache
        self.cache.setup(ttl=cache_ttl)

So, what if you do both, ie foo.setup(DiskCache(ttl=20), cache_ttl=30)? Well, (hey buddy) I don’t feel like I have to protect you from that. If you are curious, you can figure it out from the code nevertheless.

This type of nesting lends itself to nice implementations of the configured property we mentioned before:

class Foo:
    # __init__, setup, ...
    @property
    def configured(self):
        return (self.username is not None and
                self.password is not None and
                self.cache.configured)

assuming the cache classes also have a property called configured.

Conclusion

This is a bike-shedding problem. You can choose any solution and it will work fine. While working on your SDK, there are going to be far more difficult problems to solve. So, why did I write this?

Well, in terms of features, it offers the “hybrid” approach: You can offer the simpler global-object approach for most cases, but if your user needs multiple instances, there’s nothing stopping them from doing so, without sacrificing any of the elegance of the code. It also tackles the nested configuration issue which can become complicated and lead to complicated code.

The main reason, however, is that it is a solid, one-size-fits-all solution. As I said, this is a bike-shedding problem, which means that you can spend a lot of time on it even though it is not a difficult problem to solve. Having a solution ready can save a lot of frustration and lost time.