eat some code

Why using factories in Django

Best way to generate data for dev and testing

May 2016 #fixtures  #factories  #admin  #yaml  #tests  #data  #Django 

From the very beginning of a project, you need some data. You need data in your development database and you need data for your automated tests. The instinctive solution is to manually enter a set of data via the Django admin. The official way is to enter data via Django fixtures file(s). Using factories will make it easier and better; here is why.

Let's be lazy

As a developer, you must be lazy. By using factories; creating a thousand blog posts requires 2 lines of code:

for i in range(0, 1000):
     ArticleFactory.create()

Focusing on what's important

A typical article will have a title, a slug, a blurb, some content, some tags, a category, a preview image, etc. You probably don't care about most of these fields content when developing. Your customer probably doesn't care about most of these fields during UAT. Usually, only 1 or 2 fields really matter.

For example, if the page you're developing displays articles filtered by category; you probably want to show "real" categories. In that case, you don't want the factories to generate random categories and you want to specify the real categories. On the other hand, you're happy for the blurbs to be "random". Factories are great for this as they let you specify some fields values while taking care of the rest of the content. Let's write fixtures with explicit categories:

innovation = CategoryFactory.create(name="Innovation")
environment = CategoryFactory.create(name="Environment")

for i in range(0, 500):
    ArticleFactory.create(category=innovation)
    ArticleFactory.create(category=environment)

The above code creates 2 categories with 500 blog articles each: Innovation and Environment.

Speeding up the project setup

When a new developer comes along, you want their project setup to be as fast as possible. Having a set of fixtures ready avoids the need to exchange SQL files. In other words, you shouldn't rely on Django admin for this. The next option is to use Django fixtures files. With yaml file for blog posts, it would look like this:

- model: blog.article
  pk: 1
  fields:
    category: 1
    title: "Article 1"
    slug: "article-1"
    body: "Some content"

- model: blog.article
  pk: 2
  fields:
    category: 2
    title: "Article 2"
    slug: "article-2"
    body: "Some content"

There are 2 obvious issues with that file:

  • what's important isn't explicit
  • the category is using primary keys values - this isn't explicit (what's category 2?)

The best option is to use python code with factories instead; this way fixtures are shorter and more explicit. As the project grows, these fixtures become part of the documentation showing the different use cases.

Handling schema update gracefully

The design changed and articles don't link to one category anymore but to several (foreign key => many to many relationship). With both yaml files and Django admin; this would involve a lot of manual work - just thinking about it is painful ! With factories, all you'll have to do is to update the Article factory, add parentheses and commas in your fixtures, reset your database and you're done:

innovation = CategoryFactory.create(name="Innovation")
environment = CategoryFactory.create(name="Environment")

for i in range(0, 500):
    ArticleFactory.create(categories=(innovation,))
    ArticleFactory.create(categories=(environment, innovation))

Handling dates gracefully

In some applications, handling dates properly is critical. For example, you might show upcoming events on your homepage. Today events must show first while yesterday events must not show. In that situation, updating events via the admin or within yaml files would be a nightmare. You've got to use some Python code. You could use Event.objects.create or an EventFactory. As previously explained, the factory would let you focus on what's important (in this example, the event start date matters):

yesterday = now() - timedelta(days=1)
tomorrow = now() + timedelata(days=1)

EventFactory.create(title="Yesterday event", start=yesterday)
EventFactory.create(title="Tomorrow event", start=tomorrow)

Explicit and fast tests

Last but not least, using factories makes your tests much better. Using yaml files for testing makes your tests run very slowly and forces you to look in different places to understand them. With factories, tests are fast and explicit:

def test_yesterday_event_not_showing_on_homepage():
    event = EventFactory.create(start=now() - timedelta(days=1))
    response = self.app.get("/")
    self.assertNotContains(event.get_absolute_url())

def test_tomorrw_event_showing_on_homepage():
    event = EventFactory.create(start=now() + timedelta(days=1))
    response = self.app.get("/")
    self.assertNotContains(event.get_absolute_url())

Sum-up

Using the Admin & SQL files

  • not reusable for tests
  • not included in the code - slow down project setup
  • schema updates could be hard to handle
  • not practical for dates

Using YAML fixtures

  • slow tests
  • good for project setup
  • tests dependencies
  • entity relations hard to handle / not explicit
  • schema update => must update fixtures
  • hard coded dates

Using factories

  • fast tests - creating only what's required
  • good for project setup
  • no tests dependencies issues
  • explicit tests - specifying what's important only
  • entity relations easy to handle
  • schema updates easy to handle
  • great to manage dates

Conclusion

I hope you're now convinced that using factories is better than YAML files. I'll show examples of factories in a future article. Please refer to Factory Boy documentation to get started.

Image Credits

The wheels of Industry - by Lotus Head via Free Images