I have a list of files in a text file, and I want to load this list into some
kind of data structure. The list is quite long, and requires to instantiate
100,000 objects in Python, all of the same type. I found out that depending on
what kind of object is used, the time it takes to instantiate all these can
vary greatly. Essentially, each line of the file is composed of tab-separated
fields, which are split into a list with Python's str.split()
method. The
question therefore is: what should I do with that list?
The object must hold a few values, so basically a list or a tuple would be enough.
However, I need to perform various operations on those values, so additional
methods would be handy and justify the use of a more complex object.
The Contenders
These are the objects I compared:
A simple list
, as returned by str.split()
. It is not very handy, but will
serve as a reference.
A simple tuple
, no more handy than the list
, but it may exhibit better
performance (or not).
A class named
List
that inherits from
list
:
class List(list):
def a(self): return self[0]
def b(self): return self[1]
def c(self): return self[2]
A class named
Tuple
that inherits from
tuple
:
class Tuple(tuple):
def a(self): return self[0]
def b(self): return self[1]
def c(self): return self[2]
A class named
ListCustomInitList
that inherits from
List
and adds a custom
__init__()
method:
class ListCustomInitList(List):
def __init__(self, *args): List.__init__(self, args)
A class named
TupleCustomInitTuple
that inherits from
Tuple
and adds a
custom
__init__()
method:
class TupleCustomInitTuple(Tuple):
def __init__(self, *args): Tuple.__init__(self)
A class named
ListCustomInit
that inherits from the
list
basic type but
has the same features as
ListCustomInitList
instead of inheriting them from
the custom
List
:
class ListCustomInit(list):
def __init__(self, *args): list.__init__(self, args)
def a(self): return self[0]
def b(self): return self[1]
def c(self): return self[2]
A class named
TupleCustomInit
that inherits from
tuple
basic type but has
the same features as
TupleCustomInitTuple
instead of inheriting them from
the custom
Tuple
:
class TupleCustomInit(tuple):
def __init__(self, *args): tuple.__init__(self)
def a(self): return self[0]
def b(self): return self[1]
def c(self): return self[2]
A class named
NamedTuple
that is made from the
namedtuple
type in the
collections
module:
NamedTuple = namedtuple("NamedTuple", ("a", "b", "c"))
A very basic class named
Class
and that inherits from
object
:
class Class(object):
def __init__(self, args):
self.a = args[0]
self.b = args[1]
self.c = args[2]
A variant of the previous that uses the
__slots__
feature:
class Slots(object):
__slots__ = ("a", "b", "c")
def __init__(self, args):
self.a = args[0]
self.b = args[1]
self.c = args[2]
A old-style class, named
OldClass
, that does not inherit from
object
:
class OldClass:
def __init__(self, args):
self.a = args[0]
self.b = args[1]
self.c = args[2]
The Benchmark
Each class is instantiated 100,000 times in a loop, with the same, constant
input data: ["a", "b", "c"]
; the newly created object is then appended to a
list. This process it timed by calling time.clock()
before and after it and
retaining the difference between the two values. The time.clock()
method has
quite a poor resolution, but is immune to the process being set to sleep by
the operating systems's scheduler.
This is then repeated 10 times, and the smallest of these 10 values is
retained as the performance of the process.
The Results
The results from the benchmark are shown relatively the speed of using a
simple list
. As expected, the use of a simple list
is the fastest, since
it requires not additional object instantiation. Below are the results:
- 1.000 list
- 2.455 tuple
- 3.273 Tuple
- 3.455 List
- 4.636 Slots
- 5.818 NamedTuple
- 6.364 OldClass
- 6.455 Class
- 6.909 TupleCustomInit
- 7.091 TupleCustomInitTuple
- 7.545 ListCustomInit
- 7.818 ListCustomInitList
Conclusion
One can draw several conclusions from this experiment:
- Not instantiating anything is much faster, even instantiating a simple tuple
out of the original list increases the run time by 150%
- The slots feature makes object instantiation 28% faster compared to a
regular class
- Deriving a class from a basic type and adding a custom
__init__()
method
that calls the parent's __init__()
adds a lot of overhead (instantiation is
7 to 8 times slower)