September 2003 - Posts

Java is the SUV of Programming Tools

Phil Greenspun's,  Java is the SUV of Programming Tools, funny as always.

Weekly Rundown 9/17-9/24

Ching-Huei makes a good point, "we’ve learnt from ESLC that manual tagging is tedious and not scalable". On the other hand even basic web services are still an experiment in industry. Given the slow rate of adoption, what chance do the more experimental techniques have?

Don Box has been making an argument against shared abstractions. He grew up in the COM/CORBA world so you can see the motivation. However, I think there is a lot to be learned by studying past distributed object technologies. Somehow every proposed technology will need to support transactions, security, and a type system. On the latter, there seems to be a rising level of dissatisfaction towards XSD as a type system. Most architects feel the specification is unnecessarily complicated. On schema Don has said, "XML schema makes COM look like VB".

This week I’ve been reading about RelaxNG and schematron as alternatives to XSD. Both specifications are light and easy to learn. RelaxNG, specially, makes a lot of sense and you immediately feel comfortable with it. The problem with both alternatives is the poor tool support.

There is good talk by Tim Ewald “Rebuilding MSDN with Angle Brackets” that details some of XML and schema issues they have encountered. I think there are some parallels with ESLC.

Circling back to my initial topic; in order for any “semantic” solution to succeed, the programming model must be very simple. HTTP and basic web services come to mind. As most of you know this is very hard. It takes a great deal of effort, iterations, implementations, and redesigns to simplify and distill a design. Which technique(s) has the best chance? Not sure but I think Clay Shirk’s “In Praise of Evolvable Systems” points in the right direction.

---------------------------------

One of the issues I am considering is serializing controls to/from web services. In this scenario every service would have a GET and POST method analogous to HTTP. The GET method would return the form elements to the user and the POST method would process the form.

Rendering a portal page would truly be just the execution of web services. This is idea is sort of crazy and I am just playing out scenarios. However, for this to work, even as a prototype, I need to serialize the controls. The trick is controls do not serialize by default.

I considered doing my own controls and only passing the HTML to render. In both cases there was a loss of functionality that I would have had to rebuild somehow. I also worried that the programming model would be difficult for developers. In the end I decided to scratch the idea since to make it work I needed to handle HTML myself and that seemed a step back.

I still think web services are the way to go. However, with the current model for ASP.NET passing controls around controls is difficult. If anyone sees an easier way, I would love to hear about it.

---------------------------------

I also put together a DevShell module for Athena Kerberos authentication – see previous entry.

Athena authentication .Net module for DevShell

I spent Saturday night building a DevShell component for Athena Kerberos authentication. The component is pretty simple. It only checks whether your username/password value pair is correct. I did not care about tickets or other parts of the Kerberos protocol. I just wanted to give DevShell users the choice of using their Athena credentials. Everything would work as before except the password verification would go to Athena.

The trickiest part was loading the Kerberos library provided by MIT. As you can imagine the Kerberos team does not use COM or .Net. After trying lots of combinations and looking for help online I found “DllImport” did the trick.

Since I did not want to compromise users’ passwords the next step was to setup SSL on IIS. At first I wanted to setup my own certificate authority and issue certificates to my self. I tried it on windows 2000 and 2003 and ran into problems on both; most of them because I did not want to setup active directory. At this point I decided to try Verisign instead.

Verisign is the way to go. You simply create a certificate request on your server with your server private key. You then take the certificate request to Verisign and submit it on a web form at their site. The SSL certificate is mailed to you instantly and you can install it in your server.

After you install the certificate you can change any web application to require encryption; a check box in your web application properties.

The module is generic, you can use it on any application to do Athena Kerberos authentication.

Word letter order

Looks like we only need the first and last letter in a word in the correct order to understand it. Via Osherove

"aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe."

OCW communities

I’ve been thinking about OCW. Everyone gets exited about the altruistic effort but so much more is needed to complete a tutorial cycle: 1) Present information related to goals; 2) Elicit student action toward goals; 3) Assess student action; 4) Provide feedback to student; 5) Offer strategic guidance to student; 6) Manage and motivate the process.

Where will the additional steps come from? The internal buzz is “online community efforts”. Considering the content is open and anyone can start their own community, what will this landscape look like? One risk is hundreds of flavors will appear and the “reinvention of the wheel” problem will be prevalent plus there will no consistency from community-to-community (site-to-site).

Would it not be more exciting to have a learning federation instead? At the same time, given the complete freedom of participant’s the protocols must be weak, light, flexible, evolvable, and bottom up. As any designer knows that is much tougher than mammoth top down design.

Several people have commented that OCW is even more of a hit outside the US. Would a parallel translation engine be a good experiment in moving towards a learning federation? I am thinking of an open source model where the community drives the translation and we provide the infrastructure. Sort of like a sourceForge for translation.

There are some interesting insights from the community effort to translate Harry Potter to German.

Generally, would an XML portal factory be a good platform for globally dispersed communities?

Self-Publishing

Scott Mitchell goes through the steps of self publishing. If you contract a printing company, you can print a 400 page book for $5.72 each on a 1000 books order. Amazon or a brick-n-mortar will take ~50% from the sale price.

Autoupdate

The “autoupdate” setting on “windows update” does not seem to do what the name implies on all operating systems. I have been setting several servers on autoupdate yet the windows2000 machines do not update, last night for example. Does anyone know any tricks to make this setting work? Has anyone else had problems with these settings?

I would like to set the patching on auto and not have to worry about it anymore.

Tools,tools,tools

One of the best tool collections I have seen, Scott Hanselman's Tools. He also keeps a nice list of presentation tools and presentation tips.

Basic Typed InfoSet using XmlSerializer (Via Don Box)

using System;
using System.Xml.Serialization;
[XmlRoot(Namespace="http://iesl.mit.edu/doval/content")]
[XmlType(Namespace="http://iesl.mit.edu/doval/content")]
public class Content
{
 public DateTime CreationDate;
 public string Title;
 public string Author;
 [XmlElement("KeyWord")] public string[] Keywords;
 public string ContentType;
}
public class App
{
 static void Main()
 {
  XmlSerializer ser = new XmlSerializer(typeof(Content));
  Content c = (Content)ser.Deserialize(Console.In);
  Console.WriteLine("C:{0}\nT:{1}", c.CreationDate, c.Title);
 }
}

Sample XML data file

<C:Content xmlns:C="http://iesl.mit.edu/doval/content">
     <C:CreationDate>2003-04-15T23:59:00</C:CreationDate>
     <C:Title>Tax Day</C:Title>
     <C:Author>Don Box</C:Author>
     <C:Keyword>IRS</C:Keyword>
     <C:Keyword>Government</C:Keyword>
     <C:ContentText>Government</C:ContentText>
</C:Content>

 

Tools

CodeSmith, code gen looks interesting. Ditto for “Snippet Compiler”.

TechEd talks online

I feel like I am blogging up a storm today. The talks for TechEd have been released to the public, http://microsoft.sitestream.com/. The “web services” track has talks from Don Box and other good guys. 

Writing advice

Thinking about writing? Writing advice from Mike Gunderloy:
Advice for Writers Part 1
Advice for Writers Part 2
Advice for Writers Part 3
Advice for Writers Part 4
Advice for Writers Part 5
Advice for Writers Part 6
Advice for Writers Part 7
Advice for Writers Part 8

Research Software

Follow up to my previous entry. Is the cost of sharing abstractions the barrier to leveraging research software? Does the barrier lie in the cost to propagate the knowledge to consume each abstraction?

 

If this is true, does open source makes the situation worse?

Listen to Don Box?

I just listened to Don Box (part1 and part2), impressive as usual. He is talking about Service Oriented Programming and Architecture (SOAP). Below are my impressions of Don’s talk. The ideas are his but I am sure my filter and flavoring diverges from the master.

I would be interested in everyone’s impressions on Don’s ideas. Have a listen and we can discuss the talk in our next lab meeting.

ICs

OOP components are analogous to integrated circuits (ICs). However, today’s integrated circuits are no longer replaced, they are soldered into place. The same has happened to today’s OOP components. That is, although the original intent was for components to be replaced in the field, in practice that has not happened. The high cost of replacement emanates from developer’s abstractions that must be shared.

Abstractions are very Expensive

If distributed computing is to be broadly realized, shared abstractions must be very few if any. This premise makes me question some of the semantic web efforts. The field puts forth new abstractions (concepts/technologies) at a breakneck speed.

Not only do we want very few fixed abstractions, we also want "no user abstractions". Users can create abstractions with in their boundaries but not share them. Sounds fishy? Well, part of the answer is that when objects are serialized and go on the wire they are no longer objects and that your schema is just “your” schema. More on this issue later.

Another interesting idea is approaching sharing of abstractions the same way we approach sharing of system resources. Don says that fixed abstractions need to be controlled in the same way other systems resources are controlled (DB, File sharing, etc)

Interestingly Don hints that the heart of the distributed computing problem is the human. It is very expensive to get people to agree. It is very hard to have shared understanding of what you are actually designing when you collaborate across organizations (Don gives the IBM-MS example). Service orientation is a step in the right direction as it expects low fidelity, high latency exchanges.

Schema is not a type system

The notion of walking up to an XML schema and expecting all the constructs to come into my code is dangerous road to take. Don presents a fundamental shift; he says XML schema is not the authoritative description of XML but just my way of making it work.

Schemas are relative; the truth is what is on the wire, says Don. XML schema is a relative interpretation of that truth which may help me but everyone will need different help. XML schema is not a type system. You can use what is on the wire to build a type but it is important to note it is not a type.


The goal is to not share types because types are abstractions. The goal is to share machine readable validation instructions (Schema) that tell us what is legal. Everything else will not work overtime due to the cost of abstractions.  

Submitting Papers

Conference: The 2004 Frontiers in Education Conference (FIE 2004),

  • October 20-23, 2004
  • Savannah, Georgia
  • Abstract Deadline: 1/5/04.

http://www.fie-conference.org/04/

Conference: The Thirteenth International World Wide Web Conference

  • May 17-22, 2004
  • New York, NY
  • Paper deadline: November 14, 2003.

http://www2004.org/

Journal: IEEE Internet Computing. Calls for papers:

  • Internationalizing the Web, deadline 22 September 2003
  • Information Dissemination on  the Web, 3 October 2003
  • Wireless Grids, 2 December 2003

http://computer.org/internet/call4ppr.htm
http://computer.org/internet/about.htm

Journal: IEEE Transactions on Education
link: http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=13
Authors: http://www.ewh.ieee.org/soc/es/transactionspolicy.html

IEEE Transactions on Parallel and Distributed Systems, http://www.computer.org/tpds/about.htm

IEEE Transactions on Knowledge Engineering and Data Engineering, http://www.computer.org/tkde/about.htm

Conference: Mobile Systems, Applications, and Services. Deadline October 31, 2003. http://www.sigmobile.org/mobisys/2004/

Conference on World Wide Web Applications, http://www.udw.ac.za/www2003/callforpapers.htm

Event blooging site, fundamental shift?

An interesting new development on the blogging front. I just saw a web site created for the sole purpose of blogging the PDC - http://pdcbloggers.net/.

The PDC blogging site is interesting because they are not hosting blogs. The site is simply picking up the feeds of anyone who registers their blog. Technically, you do not even need to have an RSS reader server side! All they need to do is pick up the RSS feed and pass them on. In fact they do not even need to do that much. They can just pass the live collection of bloggers as an RSS feed and your aggregator will do the rest. Any changes to the collection would be registered on the next refresh.

If successful, the site will bring a tremendous amount of content (feeds) together with very little infrastructure. Is the authoring, publishing, and distribution of this site a fundamental shift from current web hosting and email solutions?

There are a number of question in my mind as to how the feeds will be structured, ranked, the life of the site, and whether the good content will be visible. Looks like a good experiment though.