Python Classes: inherit, override, extend
Here is a question that comes up from time to time when people start writing classes in Python: “I defined a derived class but it is not getting the data from the base class. What’s going on?”
Here is a bit of example code that shows this problem:
class A(object): def __init__(self): self.dataA = 'instance of class A' class B(A): def __init__(self): self.dataB = 'instance of class B' b = B() print("b.dataA:", b.dataA)
But this apparently trivial code fails:
Traceback (most recent call last):
File "inherit.py", line 34, in <module>
print("b.adata:", b.adata)
AttributeError: 'B' object has no attribute 'adata'
Wait, what? I thought the derived class inherited from the base class,
yet the variable adata
set in the base class does not exist? The short
answer is that is how classes work in Python. But the details end up
needing a bit of explanation.
Viewing Class Inheritance Details
Before getting to the explanation, it is worth pointing out that you do
not have to take an expert’s word for these things, it is pretty easy
to just stick some diagnostic output into your script to examine what is
happening. In this post I am going to be lazy and just use print()
,
but it is worth noting it is usually better to use Python’s logging
module which gives you far more control of the output and where it
goes. But logging is its own topic.
So here we will insert some diagnostics into the simple program:
class A(object): print('Setting class variables in A') clsdataA = 'class A' def __init__(self): print('Initializing instance of', self.__class__.__name__) self.dataA = 'instance of class A' def methA(self): return "A method from class A" class B(A): print('Setting class variables in B') clsdataB = 'class B' def __init__(self): print('Initializing instance of', self.__class__.__name__) self.dataB = 'instance of class B' print("Begin examination...") print("Data from classes:") print("A.clsdataA:", A.clsdataA) print("B.clsdataA:", B.clsdataA) print("Instantiating A as a:") a = A() print("Data from instance a:") print("a.clsdataA:", a.clsdataA) print("a.dataA:", a.dataA) print("call methA directly from a:", a.methA()) print("Instantiating B as b:") b = B() print("Data from instance b:") print("b.clsdataB:", b.clsdataB) print("b.dataB:", b.dataB) print("b.clsdataA:", b.clsdataA) print("call methA from b:", b.methA()) print("b.dataA:", b.dataA)
When running this, we see:
Setting class variables in A <1>
Setting class variables in B
Begin examination...
Data from classes:
A.clsdataA: class A <2>
B.clsdataA: class A
Instantiating A as a:
Initializing instance of A <3>
Data from instance a:
a.clsdataA: class A <4>
a.dataA: instance of class A
call methA directly from a: A method from class A
Instantiating B as b:
Initializing instance of B <5>
Data from instance b:
b.clsdataB: class B
b.dataB: instance of class B
b.clsdataA: class A <6>
call methA from b: A method from class A
Traceback (most recent call last):
File "./override.py", line 41, in <module>
print("b.dataA:", b.dataA)
AttributeError: 'B' object has no attribute 'dataA'
-
The two assignments of class data happen before anything: a class definition is an executable statement, executed when reached in the file.
-
You can access the class data before any instances are created, as per the previous note.
-
The class A initializer is called when A is instantiated.
-
Data from an instance of the base class is as expected.
-
The class B initializer (only) is called when B is instantiated.
-
The class data from the base class and the method from the base class are inherited as expected.
So now we have a good idea what happened. Instantiating the derived class
never called the __init__
method in the base class. That is
because inheritance is child-oriented in Python. If the base class
defines something and the derived class does not, the reference resolves
to the one in the base class. But here we have defined an __init__
in the derived/child class, so the reference resolves to that. Think of
a given class instance as a dictionary, where there’s one (and only one)
mapping of each name in self
to the object that will correspond to
that name.
Python has a term for the way things are looked up, the Method Resolution
Order (MRO). The MRO documentation is quite detailed because it deals
also with complex multiple inheritance questions (something many OO
languages just avoid by not allowing multiple inheritance at all), but
the situation here is pretty simple: just walk up the tree, until you
find a match; if a method is not found in any of the explicit defintions,
it will pick the method from the object
class, which is the base of
all so-called New-Style Classes and so is there implicitly even if not
listed. This "walk up" concept is very important and we’ll come back
to it.
So after a lot of words, we can see how to "fix" our initial problem: make a call to the base class initializer if we think we need that initialization to happen:
class A(object): def __init__(self): self.dataA = 'instance of class A' class B(A): def __init__(self): A.__init__(self) self.dataB = 'instance of class B' b = B() print(b.dataA)
After this simple addition, the output is as we might originally have expected, instead of the previous error:
instance of class A
The order matters, it might be worth trying the experiment again
with the initializer call and the assignment to self.dataB
swapped to see this. It means we can perform steps both before
and after the call to the parent’s method, if we need to.
There is a subtlety here: because we are calling class A
by name,
rather than through an instance, the the __init__
method of A
does not get automatically supplied with an instance reference and you
would get an error if you did not supply it (specifically,
TypeError: __init__() missing 1 required positional argument: 'self'
).
This behavior is not limited to the __init__
function, any method
of the base class can be called, which means the derived class has the
flexibility to tailor the behavior it wants: inherit from the base
clase without doing anything, override the base class, or "extend"
the base class by doing some local work before or after calling the
base class method. You can even extend methods from builtin classes -
the facility is by no means limited to your own classes.
As noted earlier, this was a simple case with no surprises, so the order is simple.
The super
Method
All of the above is pretty standard stuff about how Python classes work. There can always be surprises when you come from a familiar language to a new one and things look kind of similar but something is subtly different, but that is just part of learning.
Calling to a base class by name, however, may or may not be a good
idea. It is very clear what you mean, but it is not very flexible.
You hardcode a name; if you later change the definition of the derived
class to inherit from some other class, you have to update any calls
to the previous base class to update the name. In fact, for Python 3,
a class which does not state a base class still has one: object
;
it is not terribly clear to a reader if you call a method by name in
a base class but the base class does not visibly define it. Once you
start playing with multiple inheritance then you have to do some work
to figure out where the call is going - see the MRO sidebar. This is a
potential code maintainability issue.
There is a tool to make that happen cleanly, the built in super
function. In Python 3 super
is easy to use in simple cases. For our
modified short example, replace the explicit call to the initializer
for class A with this:
super().__init__()
super
uses data from the context of the call to figure out where to
delegate it to: the class where the call appears, and the ancestry
tree of the instance. So we do not need to supply the self
argument
in this case, unlike the hardwired call to a class’s init method.
super
in Python2 is a little dfferent, look it up if you need to use
it there.
The "walk up from the bottom" nature of super
means you have to give
some thought when designing your classes. From the viewpoint of a given
method, calling super
does not mean call my immediate parent’s method,
it means walk up from self
until you find it. Remembering that self
written in a method does not mean "me", it means the instance object I
was called with, which could very well be an instance of a class that
derived from me. So be nice to others who may want to derive from you,
don’t build in assumptions that may not work for derived classes.
Somewhat surprisingly, super
is a bit controversial in the Python
community, there have been blog posts and flame wars on whether it is a
good thing or bad thing. Most of that centers around complex inheritance
trees, especially multiple inheritance. I see no reason not to use it as
described in this post. If things get more complicated, you do need to
make sure the arguments line up between caller and callee, and because
super
doesn’t tell you what the callee is, that is work you have to
take care of manually.
Summing Up
If you’ve read this far, it may have occurred that Python classes are very flexible. This may be unlike the rather rigid contstraints of more classical object-oriented languages. Here is a concept to think about:
Classes in Python do not so much inherit, override, extend as they delegate and reuse. You derive from another class in order to reuse its code, which in practice means your dictionary of methods and data is pre-populated with references from the class(es) you are "deriving" from. But you are completely in charge of this relationship. If you define a name that is also defined above you in your inheritance tree, your definition is the one that goes in the dictionary, and you are stating you want to do that work yourself, not delegate it. You can choose, in your implementation, to call out to another implementation for the same name from a base class - and that is not so much extending as augmenting your version with code found elsewhere. And if you do not define it, you are reusing the code for that name as defined elsewhere.