SEEPAGE PROXY MODEL AND PRODUCTION FORECAST METHOD BASED ON MULTIVARIATE AND SMALL SAMPLE

Cao Chong; Cheng Linsong; Zhang Xiangyang; Jia Pin; Shi Junjie

doi:10.6052/0459-1879-21-155

Cao Chong, Cheng Linsong, Zhang Xiangyang, Jia Pin, Shi Junjie. Seepage proxy model and production forecast method based on multivariate and small sample. Chinese Journal of Theoretical and Applied Mechanics, 2021, 53(8): 2345-2354. DOI: 10.6052/0459-1879-21-155

Citation:

SEEPAGE PROXY MODEL AND PRODUCTION FORECAST METHOD BASED ON MULTIVARIATE AND SMALL SAMPLE

Graphical Abstract

Graphical Abstract

Abstract

Abstract

There are so many multi-scale, multi-variable, and multi-physics coupling nonlinear seepage problems in the process of porous media seepage, which presents a huge challenge for the characterization of complex mechanism of flow behavior in the porous media and the analytical solution of mathematical models. The complex mathematical model considers the key mechanical problems of fluid flow in porous media, and its solution is a trade-off between computational cost and calculation accuracy. In recent years, the seepage proxy model based on various types of oilfield data has provided some possible alternatives for efficiently solving multi-variable nonlinear fluid flow problems. However, the application of seepage proxy model in oilfields is limited by the small sample data due to incomplete records and improper operation. A data-driven proxy model is proposed in this paper to predict the cumulative oil production based multi-variable and small sample oilfield data. Through a series of data preprocessing methods such as filling in missing values, one-hot encoding of classified data, data standardization etc., the database to forecast oil production can be built; In this paper, the random split techniques can be used to divided the whole database into train data and test data. Besides, ten-fold cross validation can be applied to test the error and accuracy of three data-driven models, which include Random Forest, extreme Gradient Boosting and Artificial neural networks. The results show that the determination coefficients of the three data modes all exceed 0.8, and the prediction results are more consistent with the actual data; In addition, for the small sample of multivariate oilfield data, data preprocessing methods have a significant impact on the accuracy of the cumulative oil production prediction; Moreover, after data standardization, the Random Forest algorithm performs best (mean square error of 0.12, coefficient of determination 0.87), which is more suitable for small samples of multivariate production forecast problem.