Monday, February 20, 2017

Can Python Overload Methods?

Can Python Overload Methods?

The title of the post is a question that seems to come up quite often in beginner forums, you can find a lot of hits in a web search, on Stackoverflow, etc. Not surprisingly, it usually comes from people who have worked in languages where method/function overloading is a big part of the language design.

According to Wikipedia,

…function overloading or method overloading is the ability to create multiple methods of the same name with different implementations. Calls to an overloaded function will run a specific implementation of that function appropriate to the context of the call, allowing one function call to perform different tasks depending on context.

Sometimes this question ends up dismissed as stupid in the Python context, “because you don’t need that in Python”. I’d like to take some time to talk about that, because the ideas behind it aren’t worthless.

Arguments of Differing Types

As a simplistic example, consider a class which should work whether an argument to a class constructor is an integer or a double-precision

class Foo {
    public:
        Foo(int x);
        Foo(double x);
        ...
};

Foo::Foo(int x)
{
    ...
}

Foo::Foo(double x)
{
    ...
}

Foo a = new(Foo(12));
Foo b = new(Foo(18.0));

So, two different constructor functions both named after the class, as that is the C++ syntax. C++ needs that information up front, or the compiler will throw type errors on compilation, as it’s an error to pass an argument of one type to a function expecting a different type. Overloading is often discussed in terms of constructors, though it is not limited to those methods.

Python syntax does not allow multiple functions/methods to take the same name within the same scope - just like with variables. People are sometimes a little surprised at this, but a Python function definition is an executable statement, in which two things happen: a function object is created from the body of the definition, and a reference to that object is then assigned to a variable with the name of the function - in other words, a def statement is variable assignment. So assigning two different functions to the same name just means the second one replaces the first. It’s "override" rather than "overload". In the case of Python, the method we are interested in is __init__. It is not exactly a constructor, but as the method called automatically after construction, the behavior that people associate with constructors (“set things up”) happens here.

We can show this sequence with a bit of interactive Python use:

>>> a=2
>>> a=4
>>> print(a)
4
>>> def a():
...     print "Hello"
...
>>> print(a)
<function a at 0x7f7fa29b4c80>
>>> a()
Hello
>>> def a(x):
...     print "Hi, argument was", x
...
>>> print(a)
<function a at 0x7f7fa29b4cf8>   # <1>
>>> a()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: a() takes exactly 1 argument (0 given)
>>> a(8)
Hi, argument was 8
>>>
  1. note different address from previous function object

As an aside there does turn out to be a place where you can apparently have multiple method definitions of the same name, and that is when defining properties (getter/setter) - these are written decorated and are transformed by Python so they’re not really the same name, but they look that way in your source code. That’s a topic for another time, though.

So while you can not define multiple functions/methods of the same name, since Python is not a strongly typed language, that is not necessarily any problem. Consider the following snippet:

class Foo:
    def __init__(self, x):
        self.x = x

a = Foo(12)
b = Foo(18.0)
c - Foo("text")

That is, we’ve instantiated with three different types of arguments, but the initializer doesn’t look any different. Of course, you have to make sure the method you write in Python can actually handle arguments of different types. Python will let you check the type of an object, so you could go ahead and carefully check your arguments and make sure:

>>> x = 12
>>> isinstance(x, int)
True
>>> x = 18.0
>>> isinstance(x, int)
False
>>> isinstance(x, float)
True

But there’s a concept often cited in Python circles that it’s "Easier to Ask Forgiveness than Permission" (or EAFP), which suggests that you just go ahead and try something, and clean up the mess afterwards if it didn’t work. That’s antithetical to some software development theory which says to check everything first ("Look Before You Leap"), but as long as you can properly deal with the fallout from failures there’s not any real problem. That idea isn’t unique to Python, though it may be expressed in different ways when talking about different languages. Exceptions are an example of this way of thinking, and they exist in many languages. This isn’t just a case of dogma, consider that many LBYL sequences have proven to have timing holes that lead to security issues - for example if you write code to validate a filename path, then proceed to open it after the check completes, the time difference between the two, however minimal, is a window where someone external to the program can link the pathname elsewhere and cause the program to open something it didn’t intend to. This class of security issues is called TOCTOU (Time Of Check, Time Of Use).

And it turns out Python can often handle differing types without any problem - it’s not an error, for example, to perform arithmetic operations between an integer and a float. That gets into another concept often discussed around Python, "duck typing". That is a term which is intentionally a little silly:

In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behavior you need to play your language-games with.

comp.lang.python
— "Alex Martelli"

In this simple case, if your objective is to perform arithmetic on your arguments, then integers and floats do both have methods like add, subtract, multiply, etc. So there is not a compelling reason to treat them in different ways unless you actually run into a case where behavior is incompatible with your expectations.

Differing Numbers of Arguments

Another case for overloading in static languages is if the method may need to take different numbers of arguments. This can come up in a few different ways, to list a couple of examples:

  • You want to offer different ways to instantiate a class, as in a hypothetical employee database where a new employee can be added by a (Firstname, Lastname, Salary) triple, or by a string encoding all three as "Firstname-Lastname-Salary".

  • API evolution: say you’ve implemented a class, and then later find out you need to make some extensions to your API which involves passing an additional parameter. If you just change the constructor, then all the code instantiating that class must now change. But by overload through adding a new constructor plus leaving the old one and adjusting its behavior so it has a sensible default if the added argument from the new constructor is not passed old and new code can both be supported.

API Evolves, Arguments Added

Of the two examples, the "we added an argument but don’t want to break backwards compatibility" case seems fairly easy to handle in Python. A combination of keyword arguments and/or default arguments normally does the trick. So we can go from:

class Foo:
    def __init__(self, x):
        self.x = x

a = Foo(12)

to:

class Foo:
    def __init__(self, x, y=None):
        self.x = x
        self.y = y   # <1>

a = Foo(12)   # <2>
b = Foo(12, 18.0)   # <3>
  1. Even if y was not passed, this is okay since it is set to default to something (None in this case).

  2. Old way, one argument, still works

  3. New way, two arguments

Differing Class Instantiations

The other example case has some more nuances. It suggests we’re intending, up front, to allow the class to instantiated in quite different ways (although this change could of course also happen as part of an evolution)

One way to approach this case is to use Python’s keyword argument passing. Rather than trying to put this in words, here’s an example:

class Employee:
    num_of_emps = 0

    def __init__(self, **kwargs):
        if "emp_str" in kwargs:
            first, last, pay = kwargs["emp_str"].split('-')
        elif "first" in kwargs and "last" in kwargs and "pay" in kwargs:
            first = kwargs["first"]
            last = kwargs["last"]
            pay = kwargs["pay"]
        else:
            print("invalid initializer:", kwargs)
            return
        self.first = first
        self.last = last
        self.pay = pay
        Employee.num_of_emps += 1

    def __str__(self):
        return "Name: {} {}, Pay: {}".format(self.first, self.last, self.pay)

emp_1 = Employee(first="John", last="Public", pay=50000)   # <1>
emp_2 = Employee(emp_str="Test-Employee-60000")            # <2>

print(emp_1)
print(emp_2)
print("Employees:", Employee.num_of_emps)
  1. Pass a tuple of values

  2. Pass a string encoding all the values

We have managed to instantiate an Employee two ways: by passing a tuple of values, or by passing an encoded string. In the initializer, we try to work out which way we were called by digging around in the dictionary that is given to us as kwargs, then fishing the actual values out of that dict, and saving them in instance variables. So this is certainly a form of "overloading", though it feels kind of clunky.

We might as well try using default values instead:

class Employee:
    num_of_emps = 0

    def __init__(self, pay=None, last=None, first=None, emp_str=None):
        if emp_str:
            first, last, pay = emp_str.split('-')
        elif not (first and last and pay):
            print("invalid initializer")
            return

        self.first = first
        self.last = last
        self.pay = pay
        Employee.num_of_emps += 1

    def __str__(self):
        return "Name: {} {}, Pay: {}".format(self.first, self.last, self.pay)

emp_1 = Employee(first="John", last="Public", pay=50000)   # <1>
emp_2 = Employee(emp_str="Test-Employee-60000")            # <2>

print(emp_1)
print(emp_2)
print("Employees:", Employee.num_of_emps)
  1. Pass a tuple of values

  2. Pass a string encoding all the values

Notice the caller side of this is identical - the "API" is the same. This is a little simpler looking, but it still feels awkward because of making assumptions in the __init__ function, based on possibly not terribly reliable information - in the first example we looked for the presence of key names in a dictionary, in this one we’re looking for non-default values of named arguments: if the string value is present we use it, else we check that we have all three of the expected arguments in the other form, and go from there.

There is another way to tackle this, which gets back to my objective in writing these posts - learning things added to Python since the "early days" of Python 2, and seeing how they can be used to make code nicer looking, and that is to use class methods. Class methods are not really new Python, they appeared in 2.2 and the decorator form was added in 2.4. Still, it’s not something I had learned about in those early Python 2 days.

To know what’s going on here, when a method is defined inside a class definition, it is by default what is called an instance method. That means it receives an implicit first argument which is a reference to the instance object. By convention this argument is named self, though the name itself is not anything magical. For a class method, this implicit argument is instead a reference to the class object, and is by convention named cls. The simple way to set this up is to decorate the method definition with @classmethod. There is another kind of method known as a static method, which does not receive either an instance or class argument.

class Employee:
    num_of_emps = 0

    def __init__(self, first, last, pay):
        self.first = first
        self.last = last
        self.pay = pay
        Employee.num_of_emps += 1

    @classmethod
    def from_string(cls, emp_str):
        first, last, pay = emp_str.split('-')
        return cls(first, last, pay)

    def __str__(self):
        return "Name: {} {}, Pay: {}".format(self.first, self.last, self.pay)

emp_1 = Employee(first="John", last="Public", pay=50000)      # <1>
emp_2 = Employee.from_string(emp_str="Test-Employee-60000")   # <2>

print(emp_1)
print(emp_2)
print(Employee.num_of_emps)
  1. Pass a tuple of values

  2. Pass a string containing all the values, using the from_string classmethod

This leaves something nice and clean looking. I will admit for those who come from the "method overloading" point of view, it the way the string form is instantiated is different so it doesn’t look quite like classical overloading any more. Also note for symmetry, the tuple form could also be written as a class method, with both then calling to the initializer by calling through the class. Then at least the invocation methods would look more similar, as in:

emp_1 = Employee.from_tuple(first="John", last="Public", pay=50000)
emp_2 = Employee.from_string(emp_str="Test-Employee-60000")