EI、Scopus 收录
中文核心期刊

基于多变量小样本的渗流代理模型及产量预测方法

SEEPAGE PROXY MODEL AND PRODUCTION FORECAST METHOD BASED ON MULTIVARIATE AND SMALL SAMPLE

  • 摘要: 多孔介质渗流过程中存在的多尺度、多变量、多物理场耦合的非线性渗流问题给复杂渗流机理的表征及数学模型求解提出了巨大的挑战, 综合考虑地下多孔介质耦合渗流过程中关键力学问题的渗流模型往往需要在计算效率和计算精度之间权衡. 近年来, 基于油田多数据的渗流代理模型为高效求解多变量非线性渗流问题提供了思路, 而渗流代理模型在实际油田中的应用往往由于记录不全, 操作不当等因素受到小样本数据的限制. 针对这一问题, 本文提出了一种基于地质−油藏−工艺的多数据小样本渗流代理模型的产量预测方法. 通过填补缺失值, 独热编码分类数据, 数据对数化及标准化等一系列数据预处理方法, 形成了油田产量预测数据库; 经过随机劈分数据集、十折交叉验证, 测试了三种渗流代理模型的预测效果. 结果表明, 三种代理模型的决定系数均超过0.8, 模型预测结果与实际数据较为吻合; 对于小样本多变量的油田数据, 合适的数据预处理方法对模型预测效果影响显著; 经过数据标准化后, 随机森林算法表现最好, 能快速准确预测石油产量(均方误差0.12, 决定系数0.87).

     

    Abstract: There are so many multi-scale, multi-variable, and multi-physics coupling nonlinear seepage problems in the process of porous media seepage, which presents a huge challenge for the characterization of complex mechanism of flow behavior in the porous media and the analytical solution of mathematical models. The complex mathematical model considers the key mechanical problems of fluid flow in porous media, and its solution is a trade-off between computational cost and calculation accuracy. In recent years, the seepage proxy model based on various types of oilfield data has provided some possible alternatives for efficiently solving multi-variable nonlinear fluid flow problems. However, the application of seepage proxy model in oilfields is limited by the small sample data due to incomplete records and improper operation. A data-driven proxy model is proposed in this paper to predict the cumulative oil production based multi-variable and small sample oilfield data. Through a series of data preprocessing methods such as filling in missing values, one-hot encoding of classified data, data standardization etc., the database to forecast oil production can be built; In this paper, the random split techniques can be used to divided the whole database into train data and test data. Besides, ten-fold cross validation can be applied to test the error and accuracy of three data-driven models, which include Random Forest, extreme Gradient Boosting and Artificial neural networks. The results show that the determination coefficients of the three data modes all exceed 0.8, and the prediction results are more consistent with the actual data; In addition, for the small sample of multivariate oilfield data, data preprocessing methods have a significant impact on the accuracy of the cumulative oil production prediction; Moreover, after data standardization, the Random Forest algorithm performs best (mean square error of 0.12, coefficient of determination 0.87), which is more suitable for small samples of multivariate production forecast problem.

     

/

返回文章
返回