Code optimization is very important, especially for dynamic language like python. Recently I finished a python program of Approximate Dynamic Programming. One part of the program was to use dijkstra algorithm to calculate the shortest path distance in the network. Since the network is really large(about 1 million node), the first version of code was really slow which took about 2000 seconds each run. After code optimization(it took me about 6 hours), each run just takes 0.2 second. This is incredible improvement.
I used the the following tools during the process, which I highly recommend:
- python cprofile module –> this is a builtin profile module
- runsnakerun –> this is a viewer of the cProfile output
- line_profiler –> this is line by line profiler
The usage of runsnakerun is quite easy, just use the following two commands
$ python -m cProfile -o <outputfilename> <script-name> <options>
$ runsnake <outputfilename>
line_profil\ is a line by line python profiler and its webpage is:
add profile decorator to the function you want to optimize
$kernprof.py [-l/–line-by-line] script_to_profile.py
$ python -m line_profiler script_to_profile.py.lprof
Here are some tips to improve python performance:
1. in operator is really slow when the list is large. You may often use
if a in S:
When S is very large, thousands, for instance, in operator will become a bottleneck.
A good way to optimize this is have list with binary value to indicate whether a element is in the set or not. Then the code will be:
do some other thing
Of course, it will use more memory.
2. Call the library function as much as possbile.
Most of the library functions are highly optimized, or implemented in C++. So if you can call library functions, don’t implement it by you self in python. The for loop in python is slow. You can use scipy vectors to accelerate the program. You should try your best to reduce the for loop in python.
3 Don’t check each time for rare case.
you may have list with 100 elemnent, you will receive a index may or may not within range(0, 100). If it is in the range return the corresponding value, otherwise return -1. but 99.99% of the time the index is in the range. The bad is:
if idx < 1000:
B = A[idx]
B = -1
A better approach is:
B = A[idx]
B = 1
Be careful that don’t use broad Exception, use specific type of Exception.