• Entity-aware capsule network for multi-class classification of big data: a deep learning approach

      Jaiswal, Amit Kumar; Tiwari, Prayag; Garg, Sahil; Hossain, M. Shamim; University of Bedfordshire; University of Padova; École de technologie supérieure, Montréal; King Saud University (Elsevier B.V., 2020-11-20)
      Named entity recognition (NER) is one of the most challenging natural language processing (NLP) tasks, as its performance is related to constantly evolving languages and dependency on expert (human) annotation. The diverse and dynamic content on the web significantly raises the need for a more generalized approach—one that is capable of correctly classifying terms in a corpus and feeding subsequent NLP tasks, such as machine translation, query expansion, and many other applications. Although extensively researched in recent times, the variety of public corpora available nowadays provides room for new and more accurate methods to tackle the NER problem. This paper presents a novel method that uses deep learning techniques based on the capsule network architecture for predicting entities in a corpus. This type of network groups neurons into so-called capsules to detect specific features of an object without reducing the original input unlike convolutional neural networks and their ‘max-pooling’ strategy. Our extensive evaluation on several benchmarked datasets demonstrates how competitive our method is in comparison with state-of-the-art techniques and how the usage of the proposed architecture may represent a significant benefit to further NLP tasks, especially in cases where experts are needed. Also, we explore NER using a theoretical framework that leverages big data for security. For the sake of reproducibility, we make the codebase open-source.