In my previous post, I mentioned that I had been working on
the case studies lately. Doing profiling experiments could be tedious. One needs
to run many scenarios with different binaries and possibly different sets of parameters;
results also need to be processed and archived. I would usually write a shell
script and let the machines do the batch jobs, and free myself to other works.
While waiting for some results just now, I went over a log
file of my previous profiling experiments. It was about DynaMIT runtime on a
single processor machine. Then I created a figure, and found some interesting
observations. The figure is so simple that I hope the following “text” version would do
the same trick. Here it is:
Date | Runtime (seconds)
Dec, 2006 | ************************************ (1067)
Mar, 2007 | ******************** (592)
Oct, 2007 | ************* (384)
Jan, 2008 | ********* (265)
For a sophisticated system like DynaMIT, usually one can
improve its efficiency in two directions.
- One is the “scientific” way, which is to find better models and
algorithms. This part is often difficult, but would bring dramatic changes to
the runtime if we manage to do it. In my research, two major advances were
introduced: utilizing sparse matrices in OD estimation, and use smart algorithms
to get the most out of parallel simulation. Since I have not yet finished all
my case studies, this post is not about it.
- The other is the “engineering” way -- use whatever practical
to solve the problem. For example, do profiling studies to find bottlenecks
before doing any optimization, avoid frequently allocating and de-allocating
memory, avoid re-calculation, save intermediate results for future use, and use
hash judiciously. The improvement shown in the figure is primarily due to this part.
The figure indicates a few surprising facts. Back in the end
of 2006, it took about 18 minutes to run this scenario, but at the beginning of
2008, it dropped to four and a half minutes, or about 1/4 of the original.
Another interesting observation is also unexpected: I have been able to reduce
the runtime by roughly 1/3 of the original almost every four months for the
past year, taking into account the fact that I did not work on it during the
whole summer of 2007.
Frankly, I do not expect such a trend would continue for
long. Most of the improvements reflected in those tests were achieved by better
“engineering” approaches. I spent a lot of efforts to revise the bad designs or
inefficient implementations inherited from the old version I took over. There
are still rooms for further improvement, but my hunch is it would probably give
us no more than 20% gain unless somebody spends tremendous effort on it.
That’s actually one of the reasons why I chose to work on
the scalability issues. Single CPU configurations do not scale well. Parallel
processing may be the way to go. I already have some promising results. Now the
problem is where I can find a cluster with 20 or more machines to finish my
case study.
Here is some background about the figure, only for those who are interested:
- The figure here shows how much time DynaMIT needs to run a
simulation of the Los Angeles
for a six-hour morning peak period on a typical day.
- All tests were performed on our server with an Intel Pentium
4 CPU of 3.6 GHz. The server has 2 GB main memory, but it is
not critical here because the simulation would need less than 200 MB for this
network.
- The server runs on Fedora Core 3 (Heidelberg). In all cases DynaMIT was
compiled by GNU g++ 3.4.4 with “-O3” flag and the debugging and profiling code
was included.
- The runtime was collected from the “user CPU time” returned
by the “time” command. (The difference between the
elapsed real time and the user CPU time was generally between 20~25 seconds.)
- In each test the run were replicated for at least five times
and the averaged runtime was used. (The deviation was not significant,
though.)
- While DynaMIT is a stochastic system and random numbers are
used in the simulation, all these tests have the same fixed seed for the random
number generator. This makes the results replicable.
- The scenario we used in those tests was the simulation on a
calibrated network of the south park area in downtown LA, California. Archival real sensor data from
the field was used in a simulate environment to mimic the real-time data feed.
- DynaMIT operates in a rolling-horizon mode for real-time
(on-line) applications. In the runtime tests we chose to use 15-minute
estimation intervals, which were deemed appropriate for this network. Every 15
minutes, sensor data for the past 15 minutes was made available to DynaMIT,
which fused it with historical information and tried to generate an unbiased
estimate of the state of the whole network. While in a real case study we often
use three or more iterations for this state estimation stage, in this run-time
tests we only used one estimation iteration per horizon. For each horizon, when
the state estimation finished, two 30-minute state prediction iterations were
run to generate predictive travel time information and route guidance. Consequently,
for each 15-minute interval in our simulation period, we would need to simulate
15+30*2=75 minutes (combining the estimation and prediction stages).
- My wiki has more details about the LA network, and a flash demo made a long time ago (for other purposes).