I've updated my personal web page on
http://web.mit.edu/tsou/index.html
You can find my resume, current research and a few other things there.
I've upload the SVM.NET assembly, a demo application with the source code, and the documentation to portalFactory site.
Three basic kernels are built-in (linear, polynomial, and Radial basis function), and user defined kernels can be added through reflection (see the documentation for details).
I am working on developing some new kernels for time series analysis.
Support both SVM Classification (C, nu) and SVM Regression (epsilon, nu).
Support multiple class classification.
Written in C#.
I/O through standard XML format. (the trained model can be saved and loaded)
Linear, Polynomial, and Gaussian kernels are built in.
User kernel can be loaded at run time (using reflection, code examples are provided).
Provide basic graphic functions (2-D, contour, highlight support vectors)
All abstracts of ESLC contents are also in RSS format now.
If you have any RSS reader, you can subscribe to
http://i2i.mit.edu/devshell/rss0.xml
Every time a new content is uploaded to ESLC, you'll get a new item in your RSS reader as well.
This is just a preliminary version, and I am working on many other features.
For example, currently you will get ALL contents in ESLC once you subscribe to the Blog.
However, we should give users the ability to set up their own criteria and subscribe only to the contents they are interested in.
Several of my previous Blogs are trying to lead to my possible research direction here. Knowledge representation, document clustering / classification, and auto annotation are several classical research fields in AI, Machine learning and Data mining world. Because the WWW, those CS oriented research fields become more and more important and practical nowadays. Many of our current projects are dealing with those issues implicitly, but still no comprehensive approach is purposed.
For example, we know metadata is extremely useful in document retrieval. However, we’ve learnt from ESLC that manual tagging is tedious and not scalable. Can we use some level of auto-annotation to help this issue? Some taxonomy (the categories when you browse) is provided in ESLC, and many manually tagged documents are already there. Can we feed that information into a program and then the program can learn how to tag similar documents in some level (Machine learning, Auto-annotation)? Can new documents be clustered / classified into the related topics automatically (Supervised learning, semisupervised learning, Vector-space approach)? More general, can this application be used in mining Blogs, Web Service (tModels in UDDI), or even the WWW?
Most traditional work in this area is focus either on a weaker semantics level, i.e., the taxonomy and thesaurus, or a very high level ontology such as cyc (http://www.opencyc.org). However, with the development of RDF/S and OWL to bridge the gap, now there are plenty of interesting things we can do.
I am currently looking into the traditional data mining techniques, Machine Learning (a branch of AI, probably I’ll take this class this semester), and OWL (particularly, OWL DL). More specific idea and probably some tools for demo are coming soon.
Weak Semantics
--> Taxonomy (Relational Model)
--> Thesaurus (ER Model, WordNet)
--> Conceptual Model (RDF/S, XTM)
--> Local Domain Theory (OWL, Descriptiom Logic)
--> Strong Semantics
Anyone who is just interested in its use cases can take a look at
http://www.w3.org/TR/webont-req/
The details of the language are seperated into 4 documents:
Overview http://www.w3.org/TR/owl-features/
Guide http://www.w3.org/TR/owl-guide/
Reference http://www.w3.org/TR/owl-ref/
Semantics and Abstract Syntax http://www.w3.org/TR/owl-semantics/
WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is inspired by current psycholinguistic and computational theories of human lexical memory. English nouns, verbs, adjectives, and adverbs are organized into synonym sets, each representing one underlying lexicalized concept. Different relations link the synonym sets.
It is most useful if you want to do word sense identification, information retrieval, or automated document classification.
It is freely available here
http://www.cogsci.princeton.edu/~wn/
and you can also find C# API for accessing the database.
If you cut and paste something from MS-Word to Blogs, make sure you hit the “WordClean” button (the rightmost button on the toolbox) before you post it. Otherwise you are likely to get a “Post Operation Failed” message, because the Blogs cannot parse some hidden characters generated by Words correctly.
Alexis sent out some screen shots about the look-and-feel, taxonomy and languages should be used in the LAI website today.
The discussion this morning hasn’t touched the “work space” part.
ESLC website is scheduled to be announced to the ESD community on Tuesday, September 2nd.