Improving DynaMIT runtime performance

In my previous post, I mentioned that I had been working on the case studies lately. Doing profiling experiments could be tedious. One needs to run many scenarios with different binaries and possibly different sets of parameters; results also need to be processed and archived. I would usually write a shell script and let the machines do the batch jobs, and free myself to other works.

 While waiting for some results just now, I went over a log file of my previous profiling experiments. It was about DynaMIT runtime on a single processor machine. Then I created a figure, and found some interesting observations. The figure is so simple that I hope the following “text” version would do the same trick. Here it is:

Date      |                          Runtime     (seconds)
Dec, 2006 | ************************************
(1067)
Mar, 2007 | ********************                
(592)
Oct, 2007 | *************                       
(384)
Jan, 2008 | *********                           
(265)   

 
 
For a sophisticated system like DynaMIT, usually one can improve its efficiency in two directions.

  •  One is the “scientific” way, which is to find better models and algorithms. This part is often difficult, but would bring dramatic changes to the runtime if we manage to do it. In my research, two major advances were introduced: utilizing sparse matrices in OD estimation, and use smart algorithms to get the most out of parallel simulation. Since I have not yet finished all my case studies, this post is not about it.
  • The other is the “engineering” way -- use whatever practical to solve the problem. For example, do profiling studies to find bottlenecks before doing any optimization, avoid frequently allocating and de-allocating memory, avoid re-calculation, save intermediate results for future use, and use hash judiciously. The improvement shown in the figure is primarily due to this part.

The figure indicates a few surprising facts. Back in the end of 2006, it took about 18 minutes to run this scenario, but at the beginning of 2008, it dropped to four and a half minutes, or about 1/4 of the original. Another interesting observation is also unexpected: I have been able to reduce the runtime by roughly 1/3 of the original almost every four months for the past year, taking into account the fact that I did not work on it during the whole summer of 2007.

 Frankly, I do not expect such a trend would continue for long.  Most of the improvements reflected in those tests were achieved by better “engineering” approaches. I spent a lot of efforts to revise the bad designs or inefficient implementations inherited from the old version I took over. There are still rooms for further improvement, but my hunch is it would probably give us no more than 20% gain unless somebody spends tremendous effort on it.

That’s actually one of the reasons why I chose to work on the scalability issues. Single CPU configurations do not scale well. Parallel processing may be the way to go. I already have some promising results. Now the problem is where I can find a cluster with 20 or more machines to finish my case study.


Here is some background about the figure, only for those who are interested:

  • The figure here shows how much time DynaMIT needs to run a simulation of the Los Angeles for a six-hour morning peak period on a typical day.
  • All tests were performed on our server with an Intel Pentium 4 CPU of 3.6 GHz. The server has 2 GB main memory, but it is not critical here because the simulation would need less than 200 MB for this network.
  • The server runs on Fedora Core 3 (Heidelberg). In all cases DynaMIT was compiled by GNU g++ 3.4.4 with “-O3” flag and the debugging and profiling code was included.
  • The runtime was collected from the “user CPU time” returned by the “time” command. (The difference between the elapsed real time and the user CPU time was generally between 20~25 seconds.)
  • In each test the run were replicated for at least five times and the averaged runtime was used. (The deviation was not significant, though.)  
  • While DynaMIT is a stochastic system and random numbers are used in the simulation, all these tests have the same fixed seed for the random number generator. This makes the results replicable.
  • The scenario we used in those tests was the simulation on a calibrated network of the south park area in downtown LA, California. Archival real sensor data from the field was used in a simulate environment to mimic the real-time data feed.
  • DynaMIT operates in a rolling-horizon mode for real-time (on-line) applications. In the runtime tests we chose to use 15-minute estimation intervals, which were deemed appropriate for this network. Every 15 minutes, sensor data for the past 15 minutes was made available to DynaMIT, which fused it with historical information and tried to generate an unbiased estimate of the state of the whole network. While in a real case study we often use three or more iterations for this state estimation stage, in this run-time tests we only used one estimation iteration per horizon. For each horizon, when the state estimation finished, two 30-minute state prediction iterations were run to generate predictive travel time information and route guidance. Consequently, for each 15-minute interval in our simulation period, we would need to simulate 15+30*2=75 minutes (combining the estimation and prediction stages).
  • My wiki has more details about the LA network, and a flash demo made a long time ago (for other purposes).

posted on Sunday, March 09, 2008 11:26 PM by wenyang

Comments

# The DynaMIT runtime figure

For background information about this figure, please see the previous post "Improving DynaMIT runtime...
Tuesday, March 11, 2008 4:29 PM by Tech blog of WEN Yang