Show simple item record

dc.contributor.authorMaslowski, Przemyslaw
dc.identifier.citationMaslowski, P. (2020) 'Data Pre-processing Techniques and Tools for Predictive Modelling Using Unstructured Inputs'. MSc by Research thesis. University of Bedfordshire.en_US
dc.descriptionA thesis submitted to the University of Bedfordshire, in fulfilment of the requirements for the degree of MSc by Research thesisen_US
dc.description.abstractData is a crucial factor within machine learning, as most of the neural networks and machine learning models are data-driven. A trained neural network can be used to predict new data that has not been seen by the model but under the trained patterns. The performance of the predictive model can vary based on the data that is being used while training. Multiple metrics have been produced after a model is trained to evaluate model performance. However, it is difficult to get an intuitive measurement that indicates if the data pre-processing of a model has been improved or not. Therefore, a constructive performance indicator tool that can be used to intuitively measure the performance of pre-processing mechanisms for a given model, has been developed through multiple experiments with 32 datasets. The experiments are set up by collecting multiple unstructured datasets which are subsequently converted into structured datasets and then evaluated by their modelling performance. The experiment results are used to evaluate the importance of each metric and priorities via weights for contextualising the preprocessing experience within the constructivist paradigm. Furthermore, a set of tools have been developed throughout the project to improve the efficiency of machine learning experiments. The developed set of tools are a part of the main software, which is named as the pre-processing assistant. The pre-processing assistant has been published to the public, and it can be used for preparing, processing, and analysing data. The software tools allow users to manipulate datasets and generate Python scripts to train a predictive model. Also, the TensorFlow framework and its machine-learning algorithms have been utilised to develop Python scripts for training and predicting datasets. The software has been used to effectively carry out the experiments which have helped to configure the performance indicator tool. In the end, the most important metrics have been discovered through various experiments. The experiments consist of training the model with and without data pre-processing techniques. The increase in each metric has been adopted to discover significant metrics. The metrics which improve frequently are estimated to be more critical and have been assigned with a higher weight. The performance indicator has been configured based on the final experiment results, and it can be used by others to measure the performance of a predictive model.en_US
dc.publisherUniversity of Bedfordshireen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.subjectdata pre-processingen_US
dc.subjectmachine learningen_US
dc.subjectsupervised learningen_US
dc.subjectdeep learningen_US
dc.subjectdata analysisen_US
dc.subjectSubject Categories::G760 Machine Learningen_US
dc.titleData pre-processing techniques and tools for predictive modelling using unstructured inputsen_US
dc.typeThesis or dissertationen_US

Files in this item

MSC Predictive Modelling Thesis.pdf

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International