Comparison of the Prediction Accuracy thru Artificial Neural Networks with Respect to Multiple Linear Regression using R
Downloads
In this paper a comparison of the prediction accuracy of a response variable given a set of predictors was made using statistical and artificial intelligence methods using R language. The compared methods were the multiple lineal regression by the least squares method and the backpropagation network (BP). The goal was to decrease the reducible error when predicting the output variable and being able to select a model, an indispensable step when developing a prediction model. The methodology consisted in two validation strategies. The first strategy measured just the training error rate using 100% of data. The second strategy used a validation set approach, dividing the observations in two parts, 50% is for a training set used to fit the models, and the remaining 50% is for a validation set used to test the fitted models. This methodology made it possible a comparison between the training error rate and testing error rate. The measures utilized to evaluate the efficiency were the sum of squared error (SSE) and the coefficient of determination (R2). The results showed that BP network can significantly decrease the reducible error improving the prediction accuracy. It is important to highlight the prediction accuracy with new or unseen observations not used during the training instead of how well the models work with the training data.
Alice, M. (2015, Septiembre 23). R-Bloggers. Retrieved from R-Bloggers: https://www.r-bloggers.com/fitting-a-neural-network-in-r-neuralnet-package/
Belman L., C., Vázquez L., J., & Hernández R., M. (2017). Classification of multivariate data using artificial neural networks, logistic regression and discriminant analysis. International Congress “Academia Journals Celaya 2017”.
Brummelhuis, R., & Luo, Z. (2017). Cds rate construction methods by Machine Learning Techniques. Data Science Central, 1-51. Retrieved from https://www.datasciencecentral.com/profiles/blogs/choice-of-k-in-k-fold-cross-validation-for-classification-in
Du, K.-L., & Swamy, M. (2014). Neural Networks and Statistical Learning. London: Springer.
Fritsch, S., & Frauke, G. (2016). neuralnet: Training of Neural Networks. Retrieved from https://CRAN.R-project.org/package=neuralnet
Hines, W., & Montgomery, D. (1996). Probability and statistics for engineering and administration. Ciudad de México: Continental S.A de C.V.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York: Springer.
Kuhn, M. (2018). caret: Classification and Regression Training. Retrieved from https://CRAN.R-project.org/package=caret
Lantz, B. (2013). Machine Learning with R. Birmingham: Packt Publishing.
Martin del Brío, B., & Sanz Molina, A. (2002). Neural networks and diffuse systems. (2a. ed.). Madrid, España: Alfaomega & RA-MA.
Naved, I. (2016, Diciembre 26). iqbalnaved.wordpress.com. Retrieved from https://iqbalnaved.wordpress.com/2016/12/26/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-network/
Picard, R., & Cook, R. (2012). Cross-Validation of Regression Models. Journal of the American Statistical Association, 575-583.
R Core Team. (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from The R Project for Statistical Computing: https://www.R-project.org/
Ruelas Santoyo, E., & Laguna González, J. (2014). Predictive comparison based in neural network versus statistical methods to forecast sales. Ingeniería Industrial. Actualidad y Nuevas Tendencias, 91-105.
San Miguel Salas, J. (2016). Development with MATLAB a neural network to estimate the electrical energy demand (Master Thesis). Valladolid, España: Universidad de Valladolid.
Shao, J. (2012). Linear Model Selection by Cross-validation. Journal of the American Statistical Association, 486-494.
Torras P., S., & Monte, E. (2013). Neural models applied in economics. Barcelona, España: Addlink.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Statistics and probability for science and engineering. Ciudad de México: PEARSON.
Williams, G. (2011). Data Mining with Rattle and R. New York: Springer.
Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. Elsevier, 95-112.