GCC/DynaMIT/MITSIM (RSS)

GCC/DynaMIT/MITSIM

Parallel DynaMIT runs on Athena

For a long time I've been searching for more machines to do experiments on parallel simulation for my case studies. Our lab does not have that much PCs, and some of them are heavily used by other researchers. I tried to contact MIT academic computing for help,  but my emails have been ignored by far. I keep looking, and finally here comes a chance: the Athena clusters.

Athena clusters have many unused machines during this week, the spring break. On Monday, after some trial-and-error efforts, I managed to make my parallel DynaMIT run on Athena. If the results are good, I might be able to finish my case studies in time.   It is not as convenient as  doing experiments in the lab -- one cannot do this remotely; I need to keep an eye on the machines while DynaMIT's running. Anyway, this is better than nothing.

posted by wenyang with 0 Comments

Google and transportation information

Yesterday I read about two articles on Google's impact on transportation. One was "Google's Online Maps Gets New Jersey Commuters On Time", the other, "Observer Sees Google's Future In Transportation Routing". Both were available on InformationWeek

The NJ TRANSIT story is somewhat expected. With the power of Google Maps and Google Earth, Google's providing information on transportation systems is of no surprise to me.  If Android becomes more acceptable to handset manufacturers and the users (which I believe is just a matter of time), more and more travelers might be able to get (near) real-time information from Google or maybe other travel information providers (such as TrafficGauge, Inrix, Traffic.com, SmarTraveler and traditional media companies) as well.

Nowadays getting near real-time data is much easier than before. GPS-equipped probe vehicles, for example, have already been used in many places in the US, China, and Europe, etc. Such data is very useful under general conditions. When sufficient historical data is available, it is relatively easy to optimize routing for a normal situation. The chanllange remains, however, when an (unexpected) incident occurs and it changes the traffic pattern significantly. In that case, having real-time information could still be very helpful. But what if most people on the road have access to such information? If all (or a significant fraction) of them change their behavior (e.g., route choice), then the situation might be even worse, and even real-time information may not be able to tell what is the best route because everything is so dynamic -- the road condition at the time one has to make a decision may be different from the condition he/she will eventually experience. In a non-recurrent situation like this, purely statistical methods might not be able to provide accurate and relevant information for travelers. A suitable model (such as DynaMIT) that can generate consistent anticipatory information would be more useful. It has built-in capability to handle drivers' "overaction" to the travel information in this situation.

I would say the transportation routing patent by Google is somewhat unexpected. After quickly skimming through the patent, I had the impression that the patent is more about how to process the data and provide it to users via wireless devices. How they obtain the data and what kind of data could be used is not detailed. Anyway, this reminded me a meeting last week with Dr. Ramachandran Ramjee, who is currently a Senior Researcher at Microsoft Research, India. He has some interesting ideas on how to collect traffic data from Smart Phones.

I believe in maybe a few years we will be able to collect pretty good (and huge amount of) traffic data from cell phones and use it for travler  information.  If I know some company is working on this now, I would be very interested to work for them. :-)


posted by wenyang with 0 Comments

The DynaMIT runtime figure

For background information about this figure, please see the previous post "Improving DynaMIT runtime performance".
 DynaMIT runtime  on the Los Angeles network (single CPU)
posted by wenyang with 0 Comments

Improving DynaMIT runtime performance

In my previous post, I mentioned that I had been working on the case studies lately. Doing profiling experiments could be tedious. One needs to run many scenarios with different binaries and possibly different sets of parameters; results also need to be processed and archived. I would usually write a shell script and let the machines do the batch jobs, and free myself to other works.

 While waiting for some results just now, I went over a log file of my previous profiling experiments. It was about DynaMIT runtime on a single processor machine. Then I created a figure, and found some interesting observations. The figure is so simple that I hope the following “text” version would do the same trick. Here it is:

Date      |                          Runtime     (seconds)
Dec, 2006 | ************************************
(1067)
Mar, 2007 | ********************                
(592)
Oct, 2007 | *************                       
(384)
Jan, 2008 | *********                           
(265)   

 
 
For a sophisticated system like DynaMIT, usually one can improve its efficiency in two directions.

  •  One is the “scientific” way, which is to find better models and algorithms. This part is often difficult, but would bring dramatic changes to the runtime if we manage to do it. In my research, two major advances were introduced: utilizing sparse matrices in OD estimation, and use smart algorithms to get the most out of parallel simulation. Since I have not yet finished all my case studies, this post is not about it.
  • The other is the “engineering” way -- use whatever practical to solve the problem. For example, do profiling studies to find bottlenecks before doing any optimization, avoid frequently allocating and de-allocating memory, avoid re-calculation, save intermediate results for future use, and use hash judiciously. The improvement shown in the figure is primarily due to this part.

The figure indicates a few surprising facts. Back in the end of 2006, it took about 18 minutes to run this scenario, but at the beginning of 2008, it dropped to four and a half minutes, or about 1/4 of the original. Another interesting observation is also unexpected: I have been able to reduce the runtime by roughly 1/3 of the original almost every four months for the past year, taking into account the fact that I did not work on it during the whole summer of 2007.

 Frankly, I do not expect such a trend would continue for long.  Most of the improvements reflected in those tests were achieved by better “engineering” approaches. I spent a lot of efforts to revise the bad designs or inefficient implementations inherited from the old version I took over. There are still rooms for further improvement, but my hunch is it would probably give us no more than 20% gain unless somebody spends tremendous effort on it.

That’s actually one of the reasons why I chose to work on the scalability issues. Single CPU configurations do not scale well. Parallel processing may be the way to go. I already have some promising results. Now the problem is where I can find a cluster with 20 or more machines to finish my case study.


Here is some background about the figure, only for those who are interested:

  • The figure here shows how much time DynaMIT needs to run a simulation of the Los Angeles for a six-hour morning peak period on a typical day.
  • All tests were performed on our server with an Intel Pentium 4 CPU of 3.6 GHz. The server has 2 GB main memory, but it is not critical here because the simulation would need less than 200 MB for this network.
  • The server runs on Fedora Core 3 (Heidelberg). In all cases DynaMIT was compiled by GNU g++ 3.4.4 with “-O3” flag and the debugging and profiling code was included.
  • The runtime was collected from the “user CPU time” returned by the “time” command. (The difference between the elapsed real time and the user CPU time was generally between 20~25 seconds.)
  • In each test the run were replicated for at least five times and the averaged runtime was used. (The deviation was not significant, though.)  
  • While DynaMIT is a stochastic system and random numbers are used in the simulation, all these tests have the same fixed seed for the random number generator. This makes the results replicable.
  • The scenario we used in those tests was the simulation on a calibrated network of the south park area in downtown LA, California. Archival real sensor data from the field was used in a simulate environment to mimic the real-time data feed.
  • DynaMIT operates in a rolling-horizon mode for real-time (on-line) applications. In the runtime tests we chose to use 15-minute estimation intervals, which were deemed appropriate for this network. Every 15 minutes, sensor data for the past 15 minutes was made available to DynaMIT, which fused it with historical information and tried to generate an unbiased estimate of the state of the whole network. While in a real case study we often use three or more iterations for this state estimation stage, in this run-time tests we only used one estimation iteration per horizon. For each horizon, when the state estimation finished, two 30-minute state prediction iterations were run to generate predictive travel time information and route guidance. Consequently, for each 15-minute interval in our simulation period, we would need to simulate 15+30*2=75 minutes (combining the estimation and prediction stages).
  • My wiki has more details about the LA network, and a flash demo made a long time ago (for other purposes).

posted by wenyang with 1 Comments

First build of DynaMIT (exe) on Windows

After spending quite a few efforts on bison++ and flex++, I've come to a successful build of DynaMIT on WinXP, finally.
Build with g++ 3.4.2 on mingw. The size of executable is about 22 M (after using "strip", I reduce its size to about 2.8M).

I do not have time to test it for now, though.

===============================================================
Reading specs from D:/MinGW/bin/../lib/gcc/mingw32/3.4.2/specs
Configured with: ../gcc/configure --with-gcc --with-gnu-ld --with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw
--enable-threads --disable-nls --enable-languages=c,c++,f77,ada,objc,java --disable-win32-registry --disable-shared
--enable-sjlj-exceptions --enable-libgcj --disable-java-awt --without-x --enable-java-gc=boehm --disable-libgcj-debug
--enable-interpreter --enable-hash-synchronization --enable-libstdcxx-debug
Thread model: win32
gcc version 3.4.2 (mingw-special)
===============================================================

posted by wenyang with 0 Comments

Using SED and AWK to find total demand from an origin for a certain time interval

Example1:

# sed -n '/21600 1 4/, /^}/ {/^{/p;}' demand.dat > test1

# awk '$2 == "11220" { x += $4 } END { print "total flow per interval: " x "\ntotal flow per hour: " x*4}' test1

Note:

  • In the first line, 21600 is the time stamp, 1 is the type, 4 is the scale.
  • In the second line, 11220 is the Node ID.

Example2:

# awk -f count_orig.awk orig=11220 Demand/demand_newpath_Xhat_0600.dat

Use script count_orig.awk, whose contain is the following:


$2 == orig { x += $4; print $2 "\t" $3 "\t" $4 }

END { print "total flow per interval: " x "\ntotal flow per hour: " x*4}

Note: "Demand/demand_newpath_Xhat_0600.dat" is the demand file for 6am.

posted by wenyang with 0 Comments

Run matlab in batch mode

Use option "-r MATLAB_command" to run matlab in batch mode.

Automatically run the specified MATLAB M-file, either commands supplied with MATLAB or your own M-files, immediately after MATLAB starts. This is also referred to as calling MATLAB in batch mode. Separate multiple commands with commas or semicolons (;). M-files must be on the MATLAB path or in the MATLAB startup directory. Do not include the pathname or a file extension.

Example: matlab -nojvm -nosplash -r MyCommand

Reference: MATLAB Documentation - Desktop Tools and Development Environment - Startup Options

We can now use "system" (need to include < cstdlib >) to call a shell-script, which in turn calls matlab to do something.

posted by wenyang with 0 Comments

[Script] run DynaMIT, check runtime

#!/bin/bash
echo '=== Running CorbaFreeDynaMIT from current directory ==='
STARTTime=`date`
echo 'Preparing to delete __* files in current directory and all sub directory'
ls __e*
if [ $? = 0 ]
then 
    echo -n "Really want to remove these files? (y/n):"
    read yesORno
    if [ $yesORno = y -o $yesORno = Y ]
        then
        echo 'find . -maxdepth 1 -name "__*" -exec rm -v {} \;'
        find . -maxdepth 1 -name "__e*" -exec rm -v {} \; && echo "Files removed."
    else
        echo "You did not choose 'y'. Continue without deleting"
    fi
fi
time ~/CorbaFreeDynaMIT/DynaMIT/DynaMIT dtaparam.dat
echo 'Script' $0 'started at: ' $STARTTime
echo -n 'Script' $0 'finished at: '&& date
echo 'Done.'

posted by wenyang with 0 Comments

nohup - run a command immune to hangups, with output to a non-tty

In bash, use

nohup ~/CorbaFreeDynaMIT/DynaMIT/DynaMIT dtaparam.dat 1>log_stdout 2>log_stderr &

 

posted by wenyang with 0 Comments

DynaMIT demo steps (By rama, 3/15)

1. Login as demo

2. Cd to 'irvine'.

3. Create another xterm for his directory:

xterm &

4. In one window, run:

./DynaMIT dtaparam.dat

[notice the ./]

5. Wait until DynaMIT starts reading socio-eco file. Then, in the other window, type:

./xdta -f xdta.ini

6. Click clock button first, in the control panel. Then choose density for color and thickness in the network view window. Expand/zoom the time horizon.

7. Make control panel window the active window (click on the top window bar).

8. Demo should run now on its own. Explain the basic DynaMIT functions and capabilities.

posted by wenyang with 0 Comments

reopen fstreams, bug?

Re-open a fstream may fail... call clear() between close() and open()!

http://gcc.gnu.org/onlinedocs/libstdc++/faq/index.html#4_4_iostreamclear

posted by wenyang with 0 Comments

Suggestions about Improving Efficiency (gcc)

During the migration of CorbaFreeDynaMIT, we have noticed that some previous code may more or less decrease the efficiency. We are not talking about the algorithm at this time, though it should be more critical and significant.

Anyway, here are a few "rules of thumb" to avoid defect.

  • Remove unnecessary code from loop. For example, making a function call to check the size of a list during the loop can be taken out of the loop if the size is not going to change during the loop.
  • Use header files properly.To minimize the time we have to wait on the compiler, it's good to only include the headers we really need. Many people simply include iostream when they don't need to -- and that can penalize the runtime as well. (http://gcc.gnu.org/onlinedocs/libstdc++/27_io/howto.html#10)
  • Use "\n" in output stream other than std::endl whenever possible. std::endl not only sends out a new line, but also flush the stream, sending out the output immediately, and the buffer is lost. The next time we add something to the stream, we have to re-allocate the buffer, which is time-consuming and ususally unnecessary.
posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (6)

Iterators

Iterators are a generalization of pointers: they are objects that point to other objects. In most situations we could instead use a pointer when an iterator is required, and even under some circumstance, the compiler will actually optimize iterators into pointers when creating the object codes. Iterators, however, are not necessarily implemented by pointers. For example, vector::iterator is not T*; string::iterator is not char*. Hence converting iterators to pointers is considered dangerous, and is likely to make our code broken.

The STL offers several predefined iterators, with different behavior. For example, the input iterators only guarantee read access. Selecting the correct type of iterators is one of the tasks we must face when migrating CorbaFreeDynaMIT.

Where does it occur? Most classes that use STL containers (vectors, pair, etc.) For example, in the getODID member function of class dtaMappingOD, using a writable iterator will cause an error: "invalid conversion from 'const idlOrigDestID* const' to 'idlOrigDestID*'."

Another example is we should not test whether a iterator is null pointer or not. The following code is invalid:

std::vector::iterator jIterator;

for( jIterator= iIterator->begin(); jIterator != iIterator->end(); jIterator++ )

{

if( jIterator ) { // i.e., if (jIterator != NULL )

// do something...

}

}
How to fix: Use const_iterator instead of the default iterator, and avoid unnecessary type conversion. Make sure we have include the proper header file, i.e.,

posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (5)

2.4 Stream formatting
The formating of stream is controled by so-called "flags". In the new c++ library, the data type for flags, defined as "__fmtflags" or "std::_Ios_Fmtflags" by the typedef statement, is in fact unsigned long integer.

For many member functions of the stream classes in STL, such as setf and flags, their arguments and/or return values are consequently of unsigned long type.

If we want to pass other types of data to those functions, we will have to cast (force type conversion) the data to unsigned long. In some situation, that might cause a problem.

Moreover, some functions in the stream classes change their definition. For instance, istream::unsetf no longer returns any value. We need to modify our code to adapt these changes.

Where does it occur? class Reader.

How to fix: We can either stick to our original data type, which is long, and use cast back and forth wherever it is needed, or change the function definition of our classes.

Changing public member function definitions for a class is error-prone and seldom suggested because it will change the interface of the class and thus might potentially affect all its clients, which make use of those public member functions.

Therefore currently we choose the first approach. We admit that conversion between unsigned long to long integers may cause some problem. In the long run we might still consider using the second approach.

posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (3)

2.2 Using std namespce

Since "std" is now a real namespace, required for many objects in GNU Standard C++ Library v3, or libstdc++-v3.

Most class or type identifiers and functions in the Standard Template Library (STL) are affected.

For example:

  • string, vector, list, map, multimap, etc.
  • i/o objects, e.g. cout, cerr, endl.
  • i/o formatting flags, eg hex, dec

Where does it occur? Most headers, source files. (*.h, *.cc, *.y)

How to fix: In most cases, we prefer adding the "std::" to wherever it is required. For example:

  • std::setiosflags( std::ios::showpoint | std::ios::fixed );
  • std::setprecision( 4 );
  • std::cout << std::flush;

This is the most explicit way. Also we need include the headers properly. For example, is required for std::cout, std::endl, etc.

We could also say "using namespace std;" somewhere before the call. This is the quick-butdirty fix, because it brings the whole of the std:: namespace into scope. Thus we should never do this in a header file, as every user of our header file will be affected by this decision.

posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (4)

2.3 From strstream to stringstream

The benefit of using stringstream include (a) we don't have to explicitly append ends to terminate the C-style character array, and (b) we don't have to mess with "freezing" functions or manage the memory ourselves. (We saw this in the original code of template std::string ToString( const TYPE &val ) from the file dtaSimulatedFlow.cc.)


Note: the stringstream.str() builds a string, not a char[] as was the case with strstream.str().
Thus we cannot to delete the buffer after use.

Where does it occur? Most classes that handles I/O, e.g. class Reader.

How to fix: std::strstream (similarly for ostrstream, istrstream) is replaced by std::stringstream (ostringstream and istringstream). Stringstreams are defined in the header .

In many cases, we simply use stringstream for output, thus we prefer std::ostringstream.

posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (2)

2.1 Deprecated headers

Table 1 shows some of the headers that may need to make changes. Although it is still possible to include those deprecated headers in g++ 3, it is strongly recommended that we use the new ones. Hence we switch to the new versions.

Deprecated headers are all located at the path /usr/include/c++/3.3.2/backward (at guitar).

Note that some headers listed in Table reftable:t1 do not belong to that path. This means that both versions should work fine at least in gcc 3.3.2, but still we prefer the new naming convention.

Where does it occur? Almost every header file (*.h) in CorbaFreeDynaMIT; many source files (*.cc and *.y).

Table 1: Deprecated headers and their recommended replacements

How to fix: Replace with the corresponding new headers.

posted by wenyang with 0 Comments

Notes to Migration of CorbaFreeDynaMIT (1)

Critical Changes of GCC (g++) from 2.96 to 3.3.2

Because of the following changes during the upgrade of gcc (from 2.x to 3.x), many source codes need modification:

  • The standard library is much more conformant. Codes or usages that do not follow the latest c++ standard may be unsupported in the new version of g++.
  • The std:: namespace is now a real namespace, not an alias for ::.
  • The standard header files for the c library don't end with .h, but begin with c (i.e. <CSTDLIB>rather than <STDLIB.H>). The .h names are still available, but are deprecated.
  • For ostream formatting, use the std::ios::fmtflags. The Integers are no longer valid.
  • <STRSTREAM>is deprecated, use <SSTREAM> instead. The strstream class is replaced by stringstream, which is more powerful and safer.
posted by wenyang with 0 Comments