Academic Journal of Computing & Information Science, 2025, 8(6); doi: 10.25236/AJCIS.2025.080612.
Peiyi Gao
School of Chemistry and Life Sciences, Nanjing University of Posts and Telecommunications, Nanjing, China, 210023
This study is based on the random forest fusion algorithm for model prediction of the Boston dataset, aiming to explore the application of statistics in biomedical research. Through data visualization and a variety of statistical methods, this study provides an in-depth analysis of the characteristics of the dataset and variable relationships. The study first introduces the importance of statistics in the biomedical field, including the application of descriptive statistics, inferential statistics, Bayesian statistics, probability theory, regression analysis, multivariate analysis, and survival analysis. Subsequently, this study elaborates on the Random Forest algorithm and constructs a hybrid model to improve the prediction accuracy by fusing the Gradient Boosted Tree (GBDT) model. Experimental results show that the fusion model performs well in reducing prediction errors and improving model stability, especially in the house price prediction task, where the fusion model outperforms the single model in terms of mean square error (MSE) and coefficient of determination (R²). By analyzing the Boston house price dataset, this study finds that variables such as the average number of rooms per dwelling (RM), the percentage of low-income people (LSTAT), and air quality (NOX) have particularly significant effects on house prices. This study not only verifies the validity of the fusion model, but also provides a scientific basis for urban planning and policy making. Future research can further optimize the model, combine deep learning techniques, expand the dataset and variables, and deepen interdisciplinary applications to enhance the universality and practicality of the research results.
Random Forest Algorithm, Gradient Boosting Tree, Data Visualization, House Price Prediction, Model Fusion
Peiyi Gao. Model Prediction Study of Boston Dataset Based on Random Forest Fusion Algorithm. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 6: 98-106. https://doi.org/10.25236/AJCIS.2025.080612.
[1] Chen T, Guestrin C. Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.
[2] Breiman L. Random forests[J]. Machine learning, 2001, 45: 5-32.
[3] Liaw A, Wiener M. Classification and regression by random Forest[J]. R news, 2002, 2(3): 18-22.
[4] Antipov E A, Pokryshevskaya E B. Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics[J]. Expert systems with applications, 2012, 39(2): 1772-1778.
[5] Bamford T, Easter C, Montgomery S, et al. A comparison of 12 machine learning models developed to predict ploidy, using a morphokinetic meta-dataset of 8147 embryos[J]. Human reproduction, 2023, 38(4): 569-581.
[6] Yan X, Li J, Smith A R, et al. Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models[J]. International Journal of Digital Earth, 2023, 16(2): 4471-4491.
[7] Louppe G. Understanding random forests: From theory to practice[D]. Universite de Liege (Belgium), 2014.
[8] Finkelshtein B, Baskin C, Maron H, et al. A simple and universal rotation equivariant point-cloud network[C]//Topological, Algebraic and Geometric Learning Workshops 2022. PMLR, 2022: 107-115.
[9] Theodoridis G, Tsadiras A. Retail Demand Forecasting: A Multivariate Approach and Comparison of Boosting and Deep Learning Methods[J]. International Journal on Artificial Intelligence Tools, 2024, 33(04): 2450001.
[10] Zhou Z H. Ensemble methods: foundations and algorithms[M]. CRC press, 2025.
[11] Merodio Gómez P, Juarez Carrillo O J, Kuffer M, et al. Earth observations and statistics: Unlocking sociodemographic knowledge through the power of satellite images[J]. Sustainability, 2021, 13(22): 12640.
[12] Kim M, Kim D, Jin D, et al. Application of explainable artificial intelligence (XAI) in urban growth modeling: A case study of Seoul metropolitan area, Korea[J]. Land, 2023, 12(2): 420.