基于主成分分析和最近邻算法的断层识别研究

邹冠贵; 任珂; 吉寅; 丁建宇; 张少敏

doi:10.3969/j.issn.1001-1986.2021.04.003

摘要: 断层是影响煤矿安全的致灾地质因素，查明断层特征是煤矿三维地震勘探的主要目的之一。常规断层解释中采用的人机交互解释方法，其可靠性在一定程度上取决于解释者的经验。为提高断层解释精度，提出一种基于主成分分析和最近邻算法来检测沿目标层断层分布的方法。首先，选择峰峰矿区羊东煤矿作为研究区域，从矿区高精度处理后获得的三维地震数据中提取10个地震属性；然后，采用主成分分析法(PCA)将上述10个地震属性整合为6个综合属性；同时，将属性信息与从矿区15口井和3条巷道确定的139个点的断层信息相结合，构建已知数据信息；在该数据信息的基础上，分别组建出数据集1和数据集2两种数据集，2种数据集的训练集与测试集的比分别为9∶1和3∶7。利用这些数据集以及十折交叉验证的方法，开展基于最近邻算法(kNN)的断层识别准确率测试，数据集1的测试准确率为87.75%，数据集2的测试准确率为71.63%；这表明训练数据量越大，断层识别准确率越高，从而也说明高密度三维地震在该方法的应用中存在一定优势。在对kNN模型的分类性能进行测试时，使用通过PCA进行降维处理的数据作为输入，计算出的分类准确率分别为89.23%和73.79%；这是因为PCA降低了原始输入特征的维数，从而减少了所需的计算量并提高了这些特征的表征能力。综合结果表明，结合PCA和kNN方法可以有效地识别断层分布，减少主观人为因素的影响，提高断层解释的效率。

Abstract: Faults are geological structures that can cause disasters and thereby affect the safety of coal mines. Insight into the distribution of faults is one of the main purposes of 3D seismic exploration in coal mines. With respect to human-computer interaction in the interpretation of faults, the reliability of fault interpretation depends to a certain extent on the interpreter's knowledge. We propose an algorithm based on principal components and nearest neighbors to detect the distribution of faults along target horizons. The Yangdong Coal Mine of Fengfeng Mining Area is selected as the research area, and ten seismic attributes are extracted from the data obtained via three-dimensional seismic acquisition and high-precision processing of the mining area. Principal component analysis(PCA) is used to integrate the aforementioned ten seismic attributes into six integrated attributes. At the same time, the attribute information is combined with the fault information of 139 points determined from 15 wells and 3 roadways in the mining area to construct a known data set. Based on these data, two sets of data were constructed. The ratio of training to testing data for the first and second data set was 9:1 and 3:7, respectively. Using these data sets and the 10-fold cross-validation method, the accuracy of fault recognition based on the k-nearest neighbors(kNN) algorithm was determined to be 87.75% for data set 1 and 71.63% for data set 2. This indicates that the accuracy of fault identification is closely related to the number of data sets. In particular, when the number of training data sets is greater than that of the testing data sets, the accuracy of fault identification is higher. The attributes obtained after dimensionality reduction via PCA were used as inputs in the evaluation of the classification results of the KNN model, and the classification accuracy rates were calculated to be 89.23% for data set 1 and 73.79% for data set 2, respectively. This is because PCA reduces the dimensionality of the original input features, thus reducing the amount of calculation required and increasing the characterization capability of these features. The results show that a combination of the PCA and kNN methods can effectively identify fault distribution, and improve the efficiency of fault interpretation.

基于主成分分析和最近邻算法的断层识别研究

Fault recognition based on principal component analysis and k-nearest neighbor algorithm