Welcome to Community Server Sign in | Join

Automatic Classification of Documents

Grid Computing Applications

<October 2003>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

Navigation

Subscriptions

technique for classification of document

Look at this concept of classification, I guess this gives us a clue about what we need to do in some direction-->

The classification can be carried out with respect to the content of the documents to be classified, and is done in a two-steps process:

  • retrieval of keywords in the documents;
  • classification of documents using a hierarchy of concepts.

The keyword retrieval in a document may be obtained by counting absolute and relative frequencies of a series of large number of character bigrams, to extract the ones that offer the best characterization for the document considered. That step can thus be considered as typical of vector space representations.

The second step however uses a semantic hierarchy on keywords in order to obtain a hierarchical classification of the set of documents itself. This step is therefore typical of structured concept representation.

The combination of these two approaches in order to classify a textual database with respect to the semantical content of the documents has the advantage of making use of computationally efficient tools through the vector representation, and integrating much semantic information with the pre-existing hierarchy of keywords.

Regards

~Vishal

posted on Sunday, October 26, 2003 8:34 PM by sapna