Identification of seismic patterns through spatial and temporal data analysis using K-Means in the Middle East region

doi:10.47310/srjecs.2024.v0i402.005

Contents

Abstract
Keywords
Introduction
Literature Review
Methodology
Result
Discussion
Conclusion
Conflict Of Interest:
Funding:
Ethical Approval:
References

Download XML

266 Views

32 Downloads

Share this article

Research Article | Volume 4 Issue 2 (July-Dec, 2024) | Pages 1 - 6

Identification of seismic patterns through spatial and temporal data analysis using K-Means in the Middle East region

Huthaifa Mohammed Kanoosh

Department of Computer Science, College of Computer Science and Mathematics, Tikrit University, Tikrit, Iraq

Under a Creative Commons license

Open Access

DOI : 10.47310/srjecs.2024.v0i402.005

Received

July 2, 2024

Revised

July 27, 2024

Accepted

Aug. 16, 2024

Published

Oct. 10, 2024

Abstract

Earthquakes are one of the most common natural events that affect large areas around the world, and the Middle East remains one of the most active areas and is characterized by its proximity to the tectonic surface. This research was conducted to analyze the earthquake data in the Middle East region within the coordinates (latitude: 12 degrees north to 42 degrees north and latitude 32 degrees east to 60 degrees east from 01/03/2000 11:30:32 to 31/12/2022 17:05:40 i.e. more than two decades of data. The data was collected only from a reliable database, where the number of exclusive earthquakes in this period reached 6555 events for this period and the information includes location ((longitude and latitude)), intensity (magnitude) and time. Referring to smart learning techniques, K-Means was used to analyze this data and divide it into creative groups (clusters) based on geography and diversity of activity. The aim of this research study, the author wants to provide an analysis of the distribution of earthquakes in the region to give insights into the patterns and distribution of activity and identify areas or regions with high and low activity. These results may contribute to improving the coordination between emergency cases and then moving From the physical impacts on local communities to those that are the inevitable result of damage caused by previous social planning, it thus provides an original body of research in planning science and disaster management.

Keywords

Machine Learning

Earthquakes in the Middle East

Data Analysis

K-Means

ML.

INTRODUCTION

Earthquakes are not only deadly to life and property but also a natural weapon [1]. Despite significant advances in seismology, the timing of earthquakes remains one of the major challenges facing scientists and engineers alike. Earthquake prediction is an experimental science and so-called traditional earthquake prediction vehicles often rely on statistical models and limited data analysis, making it difficult to predict the magnitude of earthquakes [2,3] Earthquake prediction is one of the major challenges in seismology, which has implications for public safety and disaster preparedness [4,1,5]. In recent years, started to discover innovative ways of opening those new doors in earthquake data analysis within a historical context of events that could contribute better models towards predictability. New technologies in machine learning have allowed for analysing past statistics of historical data and making better prediction models for these earthquakes.[6,7].

Machine learning algorithms have shown very good prospects in diverse fields like Flood forecasting, Weather analysis, Disease identification, and many more [7,8]. The presence of extensive, labeled datasets in seismology, as well as the prowess of machine learning models at identifying patterns, have made this area a particularly good fit for using ML [9]. To predict the magnitude, time, and location of earthquakes in terms of the historical dataset much research has been conducted using different machine learning algorithms like Recurrent Neural Networks (RNN), Extreme Learning Machines[7] .

These methods offered some good results, hence showing that machine learning is a useful technique for the knowledge of seismic events. Earthquake prediction is, indeed, notoriously difficult to forecast [10]. It is estimated that little information can be gained from the study of earthquakes due to their extreme complexities, random tendencies and hence it would not be possible to develop a model which could predict an earthquake with full reliability [11] . However, the advent of modern machine learning techniques provides a novel method to tackle this problem[2].

In this research event, the research focuses on using the K-Means clustering algorithm to identify and analyze historical earthquake data extracted from the USGS database with a specific geographic scope, the Middle East region was chosen, which lies approximately between:

- Latitude: 12°N to 42°N.

- Longitude: 32° East to 60° East.

For a specific period between 2000 and 2023 with earthquake magnitudes of 4 and above. This study will try to find collective footprints within that data which can bridge our overall knowledge about earthquakes and predicting alike. The research looks into the use of machine learning (specifically K-Means clustering) to predict earthquake magnitudes using historical data published by USGS. looking for trends, and correlations that exist in the data that might allow us to understand earthquake processes and thus, eventually predict [12] manners to prevent earthquake disasters.

LITERATURE REVIEW

A literature review is an essential step in understanding how machine learning techniques can be used in earthquake prediction and seismic data analysis. This section, reviews traditional earthquake prediction methods and current applications of machine learning in this field, with a focus on the K-Means algorithm. [12], the K-Means algorithm is mainly used to detect generalized patterns and sub-patterns from large-scale data sets. It groups earthquakes with similar magnitudes and locations to detect regions of similar seismic activity[12,13]used (cluster analysis) and (discriminant analysis), to classify different recovery areas for post-earthquake road networks to extract recovery patterns. They collected data on road type, distance from the epicenter, type of soil, and days to recover. It classified roads using cluster analysis, which split routes into clusters with the same behavior in terms of disaster-induced damage and recovery using k-means or hierarchical clustering to identify homogeneous areas in terms of recovery. Next, using this discriminant analysis method, a prediction model was created that allowed new road segments to be classified into pre-existing categories based on their characteristics. This allowed for a deeper understanding of the different moderating factors that might speed up or slow down recovery times. They found that roads closer to the epicenter took longer to recover[13]. Using earthquake data from Indonesia, [14] developed an unsupervised learning method for seismic anomaly identification called “isolation forest”. Based on the binary path of isolation trees, the isolation forest separates outlier observations, and then records noteworthy observations and distinctive patterns from the seismic data. According to the authors, the isolation forest method was able to identify unusual observations more successfully and accurately[14].

METHODOLOGY

The USGS database, which has a large number of earthquake events documented, provided the data used in this study. As K-mean algorithm parameters, this dataset also includes the earthquake's magnitude, year, location, and time. The K-means machine learning technique is a method of unsupervised learning where data is partitioned into k clusters containing similar observations. The goal is to minimize the sum of squares of the distances between each individual data point and the cluster centroid that corresponds to that point for each cluster. The earthquake data underwent a number of processes, such as data processing, filtering, and the removal of contaminants and outliers, before the K-Means algorithm was applied. Following that, determine the ideal number for every cluster by applying methods like the silhouette analysis and elbow approach. After determining the optimal number, the K-Means algorithm was applied to the data, and the resulting clusters were analyzed. Then, the spatial distribution of each cluster was visualized on the map to determine the pattern and direction of the earthquakes. The dataset used in this study includes information about the magnitude (magnitude for all), date (date), time (time), and location (location) of each earthquake, which were used as input to the K-Means algorithm for all.

1. Data: Data quality check: The agreement agreed to check the data quality, where sugar claims and outliers (outliers) were handled using statistical techniques such as completion (imputation) removal of informal values (removal of outliers), and uncertainty.

the geographic location was converted to predictions (longitude and latitude) and the size of the data was used as a features finally.

The data is read from the middle_east_earthquakes.json file and converted to a DataFrame using pandas. The important data such as latitude and longitude, force (magnitude), location, and time are then edited. The raw data is then stored in an Excel file.

2. Determine the number of original clusters: Elbow Method: By used the elbow method to calculate the last number of clusters by plotting the correspondence between the number of clusters and the circles within the cluster (within the cluster sum of squares) and noting the point at which the improvement rate starts.

Silhouette Analysis: Silhouette analysis was also performed for the foundation unit. This technique provides a measure of how well each data point agrees with the data set it recorded with other clusters.

3. Apply K-Means Algorithm: Once the puzzle of clusters is defined, you will find the K-Means algorithm on the data. The focus of the algorithm is to minimize the sum of relationships within each cluster, which means minimizing the distance between the points and the search center.

4. Analyzing and Visualizing Results: Cluster Analysis: After applying K-Means, cluster analysis enables us to understand how earthquakes are distributed in terms of magnitude and location. To obtain the characteristics of each cluster, study any trends or trends that may help in understanding the dynamics of earthquakes. To apply the K-Means algorithm to the earthquake data, first preprocessed the data by dealing with any missing values or outliers. then explored the optimal number of clusters using techniques such as the elbow method and silhouette analysis.

Once the optimal number of clusters was determined, By applied the K-Means algorithm to the data and analyzed the resulting clusters. The spatial distribution of the clusters was then visualized on a geographic map to identify any patterns or trends in earthquake locations.

Data Analysis Using K-Means Clustering

One approach to analyzing earthquake data is through the use of unsupervised machine learning techniques, such as K-Means clustering[15]. By applying K-Means clustering to earthquake data, including parameters such as magnitude, date, time, and location, researchers can identify patterns and groupings within the data that may reveal insights into the underlying factors influencing earthquake occurrences[15,16].

The K-Means algorithm partitions the data into K clusters, where each data point is assigned to the cluster with the nearest mean[17] . By optimizing the cluster assignments, the algorithm can identify distinct groups of earthquakes with similar characteristics, which may inform our understanding of the spatial and temporal distribution of seismic events [18]

Historical Earthquake Data

The application of machine learning techniques, such as K-Means clustering, to analyze historical earthquake data from the USGS holds significant promise for improving our understanding and prediction of seismic events [2]. By identifying patterns and groupings within the data, researchers can gain valuable insights into the underlying factors that influence earthquake occurrence, including magnitude, location, depth, and temporal distribution[16] .

MECHANISM OF ACTION

1. Data collection

Objective: Obtain seismic data, collected from: https://www.usgs.gov

The result was based on the code I ran. The data was for 6555 earthquakes that were found according to the criteria I specified in the search parameters (starttime, endtime, and minmagnitude) in the the Middle East region.

Steps:

Collect earthquake data from the source (United States Geological Survey)

Save the data in a CSV file (earthquake_data_prepared).

2. Read data

Objective: Read and analyze the data.

Read the data file using pandas.

Check the columns available in the data.

3. Check Required Columns

Objective: Ensure that the required columns are available.

Check for the presence of the basic columns (Magnitude, Date and Time, Location).

If the columns are not available, use the available columns such as (Magnitude, Year, Month, Day, Hour).

4. Analyze the data using K-Means

Objective: Apply the K-Means algorithm to cluster the data.

Scale the data using StandardScaler.

Apply the K-Means algorithm to classify the data.

Save the results in a CSV file.

5. Plotting Data

Objective: Create graphs to analyze the results.

Try plotting the data using seaborn to visualize the distribution between groups.

Deal with any errors related to column names.

RESULT

The K-Means algorithm was applied to the collected earthquake data, where earthquakes were classified into three main groups based on their geographical characteristics (latitude and longitude) and earthquake strength (magnitude). The table 1 below represents a sample of these results:

Latitude	Longitude	Magnitude	Cluster
33.5482	48.3346	4.1	0
27.9530	55.7860	4.5	2
14.6191	39.9519	5.1	1
14.6695	39.8764	4.4	1
14.6133	39.9196	4.4	1

Table 1 sample of these results

Where:

Latitude represents the geographical location of the earthquake on the north-south axis.

Longitude represents the geographical location on the east-west axis.

Magnitude refers to the earthquake strength measured on the Richter scale.

Cluster represents the group into which the earthquake was classified based on geographical characteristics and earthquake strength.

Cluster analysis:

Cluster 0 includes earthquakes that occurred in locations with specific geographical coordinates and with relatively medium strength.

Cluster 1 includes earthquakes that occurred in different geographical areas with varying levels of strength.

Cluster 2 includes earthquakes with distinct spatial characteristics and with similar seismic strength.

By analyzing these clusters, it can be concluded that earthquakes in Cluster 1 occurred more often in a certain area and were similar in magnitude, which can help identify areas most susceptible to moderate earthquakes. Similarly, Cluster 0 and Cluster 2 may reflect a different geographic pattern in earthquake occurrence.

In addition to the numerical analysis of the data, the earthquakes were visually represented using a 3D plot that shows their distribution based on geographic coordinates and seismic magnitude. As shown in the figure below:

Description: A graph with different colored dots

Description automatically generated

Figure Visual representation of earthquakes

- The horizontal axis represents (latitude).

- The vertical axis represents (longitude).

- The vertical axis at depth represents (seismic magnitude).

The different colors in the plot represent the clusters into which the earthquakes were classified using the (K-Means algorithm). Notice that earthquakes classified in Cluster 2 (in yellow) are concentrated in a specific area and have relatively higher seismic magnitudes, while earthquakes with intermediate magnitudes are distributed in the other clusters.

DISCUSSION

This study employed the K-Means algorithm to conduct an analysis of historic earthquake data from the USGS database. The analysis took place within the Middle East region and during the temporal period (03/01/2000 11:30:32) through (31/12/2022 17:05:40) where the number of available data were (6555). The results from our study offer significant insight in the distribution and patterns of earthquakes that may contribute to their prediction and aid in understanding their dynamical behavior.

The clustering results show that certain regions i.e. Y, are more susceptible for higher magnitude earthquakes, in particular, 2000-2010. Moreover, this behavior is in accordance with previous studies suggesting that earthquakes tend to cluster in certain geographic areas which often repeated in temporal regions. This knowledge will enhance our predictive capability towards future events in high-risk areas to assist with disaster preparedness.

1. Interpretation of the results: Data analysis using the K-Means algorithm shows that earthquakes can be divided into unique groups based on their characteristics. For example, it shows that there is a group of full-magnitude earthquakes that leave specific areas, while other groups contain earthquakes of varying low magnitude. This suggests that strong earthquakes may occur mainly in specific areas, which is consistent with the results of previous studies on earthquake sharing (USGS, 2023).

2. Comparison with previous techniques: Our results are very important compared to traditional techniques that rely on statistical results or results that use limited data. According to Rousseeuw (1987) in his research on cellulite analysis, having common algorithms such as K-Means is a powerful tool for creating hidden data, which can improve specific layouts. The use of K-Means algorithm also produces our ability to understand the geographic distribution of earthquakes, which represents recent trends in individual research [12]

3. Future indicators: To enhance the expected predictions, other machine learning techniques such as advanced algorithms or commercial partners can be integrated. In addition, incorporating additional data such as geophysical and environmental data can improve our understanding of earthquake dynamics. These are essential insights that contribute to the development of more accurate alerts (USGS, 2023).

4. Practical operations: The results of this study can lead to improved earthquake forecasting and crisis planning. By understanding the impact discovered, it could be well developed to issue alerts from professional tennis players, which contributes to air conditioning and less earthquakes on the crowds.

CONCLUSION

This research demonstrates how historical earthquake datasets can be evaluated by the K-Means clustering algorithm, which is an application of machine learning approaches for clustering the spatial distribution of earthquakes over time. Since the USGS database is rich in data, this work attempts to fill a gap in earthquake prediction without having to rely too much on standard seismological approaches.

The conclusions of this study justify why further work in this domain should be introduced, where machine learning technologies and complete datasets on earthquakes will enable the building of more refined and accurate earthquake forecasting models.

Acknowledgements:

I am pleased to present this work on Securing Office Doors, with NodeMCU ESP8266 based on RFID Technology, and express my deep gratitude to those who offered their valuable time and guidance in my time of need. It is a great honor to do this work at the esteemed Department of Computer Science, College of Computer Science and Mathematics, Tikrit University, Iraq.

Conflict of Interest:

The authors declare that they have no conflict of interest

Funding:

No funding sources

Ethical approval:

The study was approved by the Tikrit University, Tikrit, Iraq.

REFERENCES

M. Mavrouli, S. Mavroulis, E. Lekkas, and A. Tsakris, “The Impact of Earthquakes on Public Health: A Narrative Review of Infectious Diseases in the Post-Disaster Period Aiming to Disaster Risk Reduction,” Feb. 01, 2023, MDPI. doi: 10.3390/microorganisms11020419.
G. S. Baveja and J. Singh, “Earthquake Magnitude and b value prediction model using Extreme Learning Machine,” Jan. 2023, [Online]. Available: http://arxiv.org/abs/2301.09756
P. Sbarra et al., “Inferring the depth and magnitude of pre-instrumental earthquakes from intensity attenuation curves,” Natural Hazards and Earth System Sciences, vol. 23, no. 3, pp. 1007–1028, Mar. 2023, doi: 10.5194/nhess-23-1007-2023.
Y. Yumiya et al., “Emergency Medical Team Response during the Hokkaido Eastern Iburi Earthquake 2018: J-SPEED Data Analysis,” Prehosp Disaster Med, vol. 38, no. 3, pp. 332–337, Jun. 2023, doi: 10.1017/S1049023X23000432.
G. Martinelli, Y. Fu, Y. Li, and F. Vallianatos, “Editorial: Pre-earthquake observations and methods for earthquake forecasting and seismic hazard reduction,” 2023, Frontiers Media S.A. doi: 10.3389/feart.2023.1150414.
S. M. Mousavi and G. C. Beroza, “Deep-learning seismology,” Science (1979), vol. 377, no. 6607, Aug. 2022, doi: 10.1126/science.abm4470.
L. Laurenti, E. Tinti, F. Galasso, L. Franco, and C. Marone, “Deep learning for laboratory earthquake prediction and autoregressive forecasting of fault zone stress,” Earth Planet Sci Lett, vol. 598, p. 117825, Nov. 2022, doi: 10.1016/j.epsl.2022.117825.
G. S. Baveja and J. Singh, “Earthquake Magnitude and b value prediction model using Extreme Learning Machine,” Jan. 2023, [Online]. Available: http://arxiv.org/abs/2301.09756
G. C. Beroza, M. Segou, and S. Mostafa Mousavi, “Machine learning and earthquake forecasting—next steps,” Nat Commun, vol. 12, no. 1, p. 4761, Aug. 2021, doi: 10.1038/s41467-021-24952-6.
S. Sawantt et al., “Earthquake prognosis using machine learning,” ITM Web of Conferences, vol. 56, p. 05017, 2023, doi: 10.1051/itmconf/20235605017.
H. Tantyoko, D. Kartika Sari, and A. R. Wijaya, “PREDIKSI POTENSIAL GEMPA BUMI INDONESIA MENGGUNAKAN METODE RANDOM FOREST DAN FEATURE SELECTION,” 2023. [Online]. Available: http://jom.fti.budiluhur.ac.id/index.php/IDEALIS/indexHenriTantyoko|http://jom.fti.budiluhur.ac.id/index.php/IDEALIS/index|
K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering,” ACM Comput Surv, vol. 31, no. 3, pp. 264–323, Sep. 1999, doi: 10.1145/331499.331504.
J. Wu, M. Saito, and N. Endo, “Cluster Analysis and Discriminant Analysis for Determining Post-Earthquake Road Recovery Patterns,” Sensors, vol. 22, no. 6, Mar. 2022, doi: 10.3390/s22062213.
G. Airlangga, “UNSUPERVISED MACHINE LEARNING FOR SEISMIC ANOMALY DETECTION: ISOLATION FOREST ALGORITHM APPLICATION TO INDONESIAN EARTHQUAKE DATA,” Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika, vol. 4, no. 3, pp. 1827–1836, Dec. 2023, doi: 10.46306/lb.v4i3.479.
Kusmiran, “Amirin Kusmiran : Clustering and Risk Analysis of The Earthquake in Sulawesi Using Mini Batch K-Means, K-Medoids, and Maximum Likelihood Method CLUSTERING AND RISK ANALYSIS OF THE EARTHQUAKE IN SULAWESI USING MINI BATCH K-MEANS, K-MEDOIDS, AND MAXIMUM LIKELIHOOD METHOD,” Journal of Islamic Science and Technology, vol. 9, no. 1, 2023, doi: 10.22373/ekw.v9i1.13027.
J. B. Muir and Z. E. Ross, “A deep Gaussian process model for seismicity background rates,” Geophys J Int, vol. 234, no. 1, pp. 427–438, Feb. 2023, doi: 10.1093/gji/ggad074.
M. Bin Aof, E. A. Awad, S. R. Omer, B. A. Ibraheem, and Z. A. Mustafa, “An Innovative Leukemia Detection System using Blood Samples via a Microscopic Accessory,” International Journal of Engineering and Manufacturing, vol. 13, no. 1, pp. 23–32, Feb. 2023, doi: 10.5815/ijem.2023.01.03.
Z. E. Ross, W. Zhu, and K. Azizzadenesheli, “Neural mixture model association of seismic phases,” Jan. 2023.

Data Analysis Using K-Means Clustering

Historical Earthquake Data

Download PDF