What is really going on here

General Idea

Django already includes a mechanism where fields for one model are stored in more than one table – Multi Table Inheritance. That’s what happens when we do “normal” inheritance of models, without specifying anything special in the Meta of either of the models.

If we have:

from django.db import models

class Parent(models.Model):
    parental = models.IntegerField()

class Child(Parent):
    childish = models.BooleanField()

Then we can use the parental field on the Child class as if it was defined there. Multiple inheritance is also supported, and the following almost works:

from django.db import models

class Mother(models.Model):
    motherly = models.IntegerField()

class Father(models.Model):
    fatherly = models.IntegerField()

# This DOES NOT WORK, just almost
class Child(Mother, Father):
    locale = models.ForeignKey("localization.Locale")

So – if we fix the little bump (details below), then we can break our large model into many small pieces. We can throw any field that’s currently on the large model into its own model (and its own table); the large model will then subclass all of them. In principle, no other code will have to change.

Of course, that is a little too good to be true. Let us consider the…

Problems (and solutions)

Field clash

The first problem is that, as noted above, the models described above don’t actually work. both Mother and Father have a field named id (the automatically generated PK), and the child cannot have two of them.

So, we just need to define explicitly the primary key fields on the parent tables:

from django.db import models

class Mother(models.Model):
    mother_id = models.IntegerField(primary_key=True)
    motherly = models.IntegerField()

class Father(models.Model):
    father_id = models.IntegerField(primary_key=True)
    fatherly = models.IntegerField()

# Now this does work
class Child(Mother, Father):
    locale = models.ForeignKey("localization.Locale")

Implicit Joins

The above already allows us to reduce the size of the large table, which we assume is the biggest problem. But still, with this, by default, queries on the large model would join in all of the parts (as if we called select_related() with all of them); in most use-cases, this is redundant and wasteful.

The solution is to limit the fields, by default, to the ones on the actual child model, by using the model _meta API to figure out which fields we want, and the QuerySet only() method. A special manager class for broken-down models has a get_queryset() which sets this up.

Make accessed fields fetch their whole parent

With the above scheme, fields coming from parents all become deferred. This means that, when such a field is accessed for the first time, a database query is made to fetch its value. We’d prefer that, if a query is already made, we’ll get all the fields from the relevant parent.

The way this query for the deferred field is done (internally in Django) is by calling the model method refresh_from_db(); that method can take an argument that tells it exactly which fields to fetch. Usually, when getting the value of a deferred field, the function is called with the name of that field only. We override it and make sure that whenever it is given names, we complement the list of names to include all the fields of relevant parent models.

Messed up id fields

On one hand: With Mutli Table Inheritance, for each of the parents, the child gets a parent_ptr one-to-one field – which means, there’s also a parent_ptr_id column in the table (and field in the model, which we care a lot less about).

On the other hand, the pointer-field to the first parent is also taken as the Child’s primary key – by default, Child has no id field.

We can make our own primary-key id field, that’s easy; but with the kind of use we have in mind, we’d want all these ..._ptr_id fields to also have just the same value as the id field. In fact, we don’t want them at all – we’d much prefer if the original id field is used instead. To achieve this, we need to define these fields more-or-less explicitly, and set them to all point to the same database column. This requires some messing with internals (Django isn’t really built to have columns shared between fields this way).

The solution involves a special type Foreign-Key field “family” – VirtualForeignKey, VirtualOneToOneField and VirtualParentLink; the former does the heavy lifting, and the latter two put a friendlier face on it. Making them work also requires some changes in the Django model _meta implementation – we define a subclass of the relevant Django class (django.db.model.options.Options) and plug it into the model.