Neighbourhood Component Regression Approach for Housing Unit Price Prediction
Downloads
Predicting housing unit price (HUP) is important for potential buyers and investors to make informed decisions. This study proposes a novel HUP prediction model based on neighbourhood component regression (NCR). The proposed NCR model was compared with other competitive methods such as principal component regression (PCR), multiple linear regression (MLR), partial least squares regression (PLSR), and generalised linear model (GLM). When tested with real datasets, the proposed NCR method revealed prediction superiority over the four state-of-the-art methods (PCR, MLR, PLSR, and GLM). This was evident from the Mean Absolute Percentage Error (MAPE), Correlation Coefficient (R), Scatter Index (SI), and Percentage Root Mean Square Error (PRMSE) utilised as model evaluation metrics. The results revealed that the NCR model had the lowest MAPE (0.0977), SI (0.0011), PRMSE (0.1130), and highest R (0.9999) as compared with the other investigated methods. This confirms the proposed NCR method’s strength for efficient and reliable HUP prediction.
Bork, L. and Møller, S.V. (2018), “Housing price forecastability: A factor analysis”, Real Estate Economics, Vol. 46 No. 3, pp.582-611.
Boye, P., Mireku-Gyimah, D. and Sadiq, H. (2019), “Time series analysis model for estimating housing unit price”, Ghana Journal of Technology, Vol. 3 No., pp.35-41.
Bulut, E. and Alma, Ö.G. (2011), “Dimensionality reduction methods: PCR, PLSR, RRR and a health application”, Physical Sciences, Vol. 6 No. 2, pp.36-47.
Chicco, D., Warrens, M.J. and Jurman, G. (2021), “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE, and RMSE in regression analysis evaluation”, PeerJ Computer Science, Vol. 7, p.e623.
Choi, Y.Y., Shon, H., Byon, Y.J., Kim, D.K. and Kang, S. (2019), “Enhanced application of principal component analysis in machine learning for imputation of missing traffic data”, Applied Sciences, Vol. 9 No. 10, p.2149.
Cunha, A.M. and Lobão, J. (2021), “The determinants of real estate prices in a European context: a four-level analysis”, Journal of European Real Estate Research.
De Rivas, B.L., Vivancos, J.L., Ordieres-Meré, J. and Capuz-Rizo, S.F. (2017), “Determination of the total acid number (TAN) of used mineral oils in aviation engines by FTIR using regression models”, Chemometrics and Intelligent Laboratory Systems, Vol. 160, pp.32-39.
Despotovic, M., Nedic, V., Despotovic, D. and Cvetanovic, S. (2016), “Evaluation of empirical models for predicting monthly mean horizontal diffuse solar radiation”, Renewable and Sustainable Energy Reviews, Vol. 56, pp.246-260.
Dunn, P.K. and Smyth, G.K. (2018), Generalized Linear Models with Examples in R, Vol. 53, New York: Springer.
Durocher, M., Chebana, F. and Ouarda, T.B. (2016), “Delineation of homogenous regions using hydrological variables predicted by projection pursuit regression”, Hydrology and Earth System Sciences, Vol. 20 No. 12, pp.4717-4729.
Figueroa-Garcia, E., Segura-Castruita, M.A., Luna-Olea, F.M., Vázquez-Vuelvas, O.F. and Chávez-Rodríguez, A.M. (2021), “Design of a hybrid solar collector with a flat plate solar collector and induction heating: evaluation and modeling with principal components regression”, Revista Mexicana de Ingeniería Química, Vol. 20 No. 3, pp. Alim2452-Alim2452.
Goldberger, J., Hinton, G.E., Roweis, S.T. and Salakhutdinov, R. (2004), “Neighbourhood components analysis”, Advances in Neural Information Processing Systems, pp. 513–520.
Gong, Z., Liu, C., Sun, J., and Teo, K.L. (2018), “Distributionally robust L1-estimation in multiple linear regression”, Optimization Letters, pg.1-13.
Goodhue, D.L., Lewis, W. and Thompson, R. (2012), “Does PLS have advantages for small sample size or non-normal data?”, MIS Quarterly, pp.981-1001.
Gupta, R. and Kabundi, A. (2010), “Forecasting real US house prices: Principal components versus Bayesian regressions”, International Business & Economics Research Journal (IBER), Vol 9 No. 7.
Hacıevliyagil, N., Drachal, K. and Eksi, I.H. (2022), “Predicting house prices using DMA method: Evidence from Turkey”, Economies, Vol. 10 No. 3, p.64.
James, G.M. (2002), “Generalized linear models with functional predictors”, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64 No. 3, pp.411-432.
Jäntschi, L., Bálint, D. and Bolboacă, S. (2016), Multiple linear regressions by maximizing the likelihood under the assumption of generalized Gauss-Laplace distribution of the error”, Computational and Mathematical Methods in Medicine, Vol. 2016, pp.1-8.
Jolliffe, I. (2011), Principal Component Analysis, Springer: New York, NY, USA.
Jolliffe, I.T. and Cadima, J. (2016), “Principal component analysis: a review and recent developments”, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 374 No. 2065, p.20150202.
Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M. and Hooman, A. (2013), “An overview of principal component analysis”, Journal of Signal and Information Processing, Vol. 4 No. 3B, p.173.
Khuri, A.I., Mukherjee, B., Sinha, B.K. and Ghosh, M. (2006), “Design issues for generalized linear models: A review”, Statistical Science, Vol. 21 No. 3, pp.376-399.
Labban, J.A. (2020), “Estimating multiple linear regression parameters using term omission method”, Periodicals of Engineering and Natural Sciences (PEN), Vol. 8 No. 4, pp.2290-2299.
Li, M. and Liu, K. (2020), “Probabilistic prediction of significant wave height using dynamic Bayesian network and information flow”, Water, Vol. 12 No. 8, p.2075.
Li, M.F., Tang, X.P., Wu, W. and Liu, H.B. (2013), “General models for estimating daily global solar radiation for different solar radiation zones in mainland China”, Energy Conversion and Management, Vol. 70, pp.139-148.
Li, X. (2022), Prediction and analysis of housing price based on the generalised linear regression model”, Computational Intelligence and Neuroscience.
Lin, C., Thomson, G. and Popescu, S.C. (2016), “An IPCC-compliant technique for forest carbon stock assessment using airborne LiDAR-derived tree metrics and competition index”, Remote Sensing, Vol. 8 No. 6, p.528.
Nelder, J.A. and Wedderburn, R.W. (1972), “Generalised linear models”, Journal of the Royal Statistical Society: Series A (General), Vol. 135 No. 3, pp.370-384.
Paul, G., Cardinale, J. and Sbalzarini, I.F. (2013), “Coupling image restoration and segmentation: a generalized linear model/Bregman perspective”, International journal of computer vision, Vol. 104 No. 1, pp.69-93.
Phan, T.D. (2018), “Housing price prediction using machine learning algorithms: The case of Melbourne city, Australia”, IEEE in 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), pp.35-42.
Shang, Q., Tan, D., Gao, S. and Feng, L. (2019), “A hybrid method for traffic incident duration prediction using BOA-optimized random forest combined with neighborhood components analysis”, Journal of Advanced Transportation.
Stigler, S.M. (1986), The History of Statistics: The Measurement of Uncertainty Before 1900, The Belknap Press of Harvard University Press, Cambridge.
Tao, Q. (2019), “Analysis of commodity housing price based on partial least squares regression”, Academic Journal of Computing & Information Science, Vol. 2 No. 3.
Tuncer, T. and Ertam, F. (2020), “Neighborhood component analysis and relief based survival recognition methods for Hepatocellular carcinoma”, Physica A: Statistical Mechanics and its Applications, Vol. 540, p.123143.
Wang, D. and Tan, X. (2017), “Bayesian neighborhood component analysis”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29 No. 7, pp.3140-3151.
Wentzell, P.D. and Montoto, L.V. (2003), “Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures”, Chemometrics and Intelligent Laboratory Systems, Vol. 65 No. 2, pp.257-279.
West, M., Harrison, P.J., and Migon, H.S. (1985), “Dynamic generalized linear models and Bayesian forecasting”, Journal of the American Statistical Association, Vol. 80 No. 389, pp.73-83.
Wold, S., Sj ̈ostr ̈om, M. and Eriksson, L. (2001), “PLS-regression: a basic tool of chemometrics”, Chemometrics and Intelligent Laboratory Systems, Vol. 58 No. 2, pp.109–130.
Wold, H. (1975), “Soft modeling by latent variables: The non-linear iterative partial least squares (NIPALS) approach”, Journal of Applied Probability, Vol. 12 No. S1, pp.117-142.
Yingying, L. and Dongxiao, N. (2010), “Application of principal component regression analysis in power load forecasting for medium and long term”, IEEE in 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Vol. 3, pp. V3-201.
Zainuri, N.A., Jemain, A.A. and Muda, N. (2015), “A comparison of various imputation methods for missing values in air quality data”, Sains Malaysiana, Vol. 44 No. 3, pp.449-456.
Zhang, Q. (2021), “Housing price prediction based on multiple linear regression”, Scientific Programming.