Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns
Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.
 Jones D. H. IndustryNet: A model for Commerce on the Web, IEEE
Expert, Oct., pp 54-59, 1995.
 Willmot D. Alexa, PC Magazine Online, January, 1999.
 Balabanovic M. and Shoham Fab Y. Content-based collaborative
recommendation, Communications of the ACM, 40(3): 66-72, 1997.
 Tan B. Web information monitoring for competitive intelligence,
Cybernetics and Systems, 33, 3, 225-235, 2000.
 Srivastava J., Cooley R., Deshpande M., and P.-N. Tan. Web usage
mining: Discovery and applications of usage patterns from web data.
SIGKDD Explorations, 1(2), 2000.
 Salton G. Automatic Text Processing: The Transformation, Analysis,
and Retrieval of Information by Computer, Addison-Wesley, Reading,
Mass., USA, 1999.
 Karypis G. Multilevel hypergraph partitioning: Application in VLSI
domain, Proceedings of ACM/IEEE Design Automation Conference,
 Chang C. Customizable multi-engine search tool with clustering.
Proceedings of 6th International Web Conference, 1997.
 Jain A. Algorithms for Clustering Data. Prentice Hall, 1998.
 Titterington D. Statistical Analysis of Finite Mixture Distributions. John
Wiley & Sons, 1985.
 Lu S. and Fu K. A sentence-to-sentence clustering procedure for pattern
analysis. IEEE Transactions on Systems, Man, and Cybernetics, 8, 381-
 Moore J. Web Page Categorization and Feature Selection Using
Association Rule and Principal Component Clustering, TR 9405380,
Department of Computer Science, University of Minnesota, 2001.
 Cheung D. Discovering User Access Patterns on the Web, Knowledge
Based Systems, 10, 463-470, 1998.