Excellence in Research and Innovation for Humanity

Haider A Ramadhan

Publications

1

Publications

1
5165
Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns
Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords:
Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.