Web Users Session Analysis using DBSCAN and Two Phase Utility Mining Algorithms
G. Sunil Kumar1, C.V. K Sirisha2, Kanaka Durga. R 3, A. Devi4
1G. Sunil Kumar, Department.of Computer Applications, Maris Stella College, Vijayawada, A.P., India.
2G. Sunil Kumar, Department of Computer Applications, Maris Stella College, Vijayawada, A.P., India.
3G. Sunil Kumar, Deptartment of Computer Applications, Maris Stella College, Vijayawada, A.P., India.
4G. Sunil Kumar, Deparment of Computer Applications, Maris Stella College, Vijayawada, A.P., India.
Manuscript received on December 06, 2011. | Revised Manuscript received on December 25, 2011. | Manuscript published on January 05, 2012. | PP: 396-401 | Volume-1 Issue-6, January 2012. | Retrieval Number: F0351121611/2012©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: One of the important issues in data mining is the interestingness problem. Typically, in a data mining process, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, utility measures have been used to reduce the patterns prior to presenting them to the user. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. This proposed approach uses a utility based itemset mining approach to overcome this limitation. This proposed system first uses Dbscan clustering algorithm which identifies the behavior of the users page visits, order of occurrence of visits. After applying the clustering technique High Two phase utility mining algorithm is applied, aimed at finding itemsets that contribute high utility.Mining web access sequences can discover very useful knowledge from web logs with broad applications. Mining useful Web path traversal patterns is a very important research issue in Web technologies. Knowledge about the frequent Web path traversal patterns enables us to discover the most interesting Websites traversed by the users. However, considering only the binary (presence/absence) occurrences of the Websites in the Web traversal paths, real world scenarios may not be reflected. Therefore, if we consider the time spent by each user as a utility value of a website, more interesting web traversal paths can be discovered using proposed two-phase algorithm. User page visits are sequential in nature. In this paper MSNBC web navigation dataset is used to compare the efficiency and performance in web usage mining is finding the groups which share common interests General Terms Web session mining, log analysis.
Keywords: Webusage Mining, Itemset, DBScan, Association rules.