Real-time user clickstream behavior analysis based on Apache Storm streaming
Name:
Pal2021_Article_Real-timeUserC ...
Size:
3.246Mb
Format:
PDF
Description:
final published version
Issue Date
2021-12-22Subjects
clickstream behavior analysisApache Storm
Subject Categories::G440 Human-computer Interaction
Metadata
Show full item recordAbstract
This paper presents an approach to analyzing consumers’ e-commerce site usage and browsing motifs through pattern mining and surfing behavior. User-generated clickstream is first stored in a client site browser. We build an ingestion pipeline to capture the high-velocity data stream from a client-side browser through Apache Storm, Kafka, and Cassandra. Given the consumer’s usage pattern, we uncover the user’s browsing intent through n-grams and Collocation methods. An innovative clustering technique is constructed through the Expectation-Maximization algorithm with Gaussian Mixture Model. We discuss a framework for predicting a user’s clicks based on the past click sequences through higher order Markov Chains. We developed our model on top of a big data Lambda Architecture which combines high throughput Hadoop batch setup with low latency real-time framework over a large distributed cluster. Based on this approach, we developed an experimental setup for an optimized Storm topology and enhanced Cassandra database latency to achieve real-time responses. The theoretical claims are corroborated with several evaluations in Microsoft Azure HDInsight Apache Storm deployment and in the Datastax distribution of Cassandra. The paper demonstrates that the proposed techniques help user experience optimization, building recently viewed products list, market-driven analyses, and allocation of website resources.Citation
Pal G, Atkinson K, Li G (2021) 'Real-time user clickstream behavior analysis based on Apache Storm streaming', Electronic Commerce Research, 23, pp.1829 -1859 .Publisher
SpringerJournal
Electronic Commerce ResearchAdditional Links
https://link.springer.com/article/10.1007%2Fs10660-021-09518-4Type
ArticleLanguage
enISSN
1389-5753EISSN
1572-9362Sponsors
This research was funded by Accenture Technology Labs, Beijing, China. Grant number RDF 15–02–35.ae974a485f413a2113503eed53cd6c53
10.1007/s10660-021-09518-4
Scopus Count
Collections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Green - can archive pre-print and post-print or publisher's version/PDF