Microblog: A very long article Wikipedia article on the orientation of toilet paper [Jun 7th, 22:52] [R]
Talk Like a Pirate Day

Sunday, September 15th, 2013

Instantiating Many Objects in Python

Categories: [ IT ]

I have a list of files in a text file, and I want to load this list into some kind of data structure. The list is quite long, and requires to instantiate 100,000 objects in Python, all of the same type. I found out that depending on what kind of object is used, the time it takes to instantiate all these can vary greatly. Essentially, each line of the file is composed of tab-separated fields, which are split into a list with Python's str.split() method. The question therefore is: what should I do with that list?

The object must hold a few values, so basically a list or a tuple would be enough. However, I need to perform various operations on those values, so additional methods would be handy and justify the use of a more complex object.

The Contenders

These are the objects I compared:

A simple list, as returned by str.split(). It is not very handy, but will serve as a reference.

A simple tuple, no more handy than the list, but it may exhibit better performance (or not).

A class named List that inherits from list:
class List(list):
  def a(self): return self[0]
  def b(self): return self[1]
  def c(self): return self[2]
A class named Tuple that inherits from tuple:
class Tuple(tuple):
  def a(self): return self[0]
  def b(self): return self[1]
  def c(self): return self[2]
A class named ListCustomInitList that inherits from List and adds a custom __init__() method:
class ListCustomInitList(List):
  def __init__(self, *args): List.__init__(self, args)
A class named TupleCustomInitTuple that inherits from Tuple and adds a custom __init__() method:
class TupleCustomInitTuple(Tuple):
  def __init__(self, *args): Tuple.__init__(self)
A class named ListCustomInit that inherits from the list basic type but has the same features as ListCustomInitList instead of inheriting them from the custom List:
class ListCustomInit(list):
  def __init__(self, *args): list.__init__(self, args)
  def a(self): return self[0]
  def b(self): return self[1]
  def c(self): return self[2]
A class named TupleCustomInit that inherits from tuple basic type but has the same features as TupleCustomInitTuple instead of inheriting them from the custom Tuple:
class TupleCustomInit(tuple):
  def __init__(self, *args): tuple.__init__(self)
  def a(self): return self[0]
  def b(self): return self[1]
  def c(self): return self[2]
A class named NamedTuple that is made from the namedtuple type in the collections module:
NamedTuple = namedtuple("NamedTuple", ("a", "b", "c"))
A very basic class named Class and that inherits from object:
class Class(object):
  def __init__(self, args):
    self.a = args[0]
    self.b = args[1]
    self.c = args[2]
A variant of the previous that uses the __slots__ feature:
class Slots(object):
  __slots__ = ("a", "b", "c")
  def __init__(self, args):
    self.a = args[0]
    self.b = args[1]
    self.c = args[2]
A old-style class, named OldClass, that does not inherit from object:
class OldClass:
  def __init__(self, args):
    self.a = args[0]
    self.b = args[1]
    self.c = args[2]

The Benchmark

Each class is instantiated 100,000 times in a loop, with the same, constant input data: ["a", "b", "c"]; the newly created object is then appended to a list. This process it timed by calling time.clock() before and after it and retaining the difference between the two values. The time.clock() method has quite a poor resolution, but is immune to the process being set to sleep by the operating systems's scheduler.

This is then repeated 10 times, and the smallest of these 10 values is retained as the performance of the process.

The Results

The results from the benchmark are shown relatively the speed of using a simple list. As expected, the use of a simple list is the fastest, since it requires not additional object instantiation. Below are the results:

  • 1.000 list
  • 2.455 tuple
  • 3.273 Tuple
  • 3.455 List
  • 4.636 Slots
  • 5.818 NamedTuple
  • 6.364 OldClass
  • 6.455 Class
  • 6.909 TupleCustomInit
  • 7.091 TupleCustomInitTuple
  • 7.545 ListCustomInit
  • 7.818 ListCustomInitList

Conclusion

One can draw several conclusions from this experiment:

  • Not instantiating anything is much faster, even instantiating a simple tuple out of the original list increases the run time by 150%
  • The slots feature makes object instantiation 28% faster compared to a regular class
  • Deriving a class from a basic type and adding a custom __init__() method that calls the parent's __init__() adds a lot of overhead (instantiation is 7 to 8 times slower)

[ Posted on September 15th, 2013 at 15:13 | no comment | ]