Anomaly Detection using Autoencoder
It is a type of neural network that learns to compress and then reconstruct the original data, allowing it to identify anomalies in the data.
Python3
# Exclude datetime column again data_tensor = tf.convert_to_tensor(data_converted.drop( 'timestamp' , axis = 1 ).values, dtype = tf.float32) # Define the autoencoder model input_dim = data_converted.shape[ 1 ] - 1 encoding_dim = 10 input_layer = Input (shape = (input_dim,)) encoder = Dense(encoding_dim, activation = 'relu' )(input_layer) decoder = Dense(input_dim, activation = 'relu' )(encoder) autoencoder = Model(inputs = input_layer, outputs = decoder) # Compile and fit the model autoencoder. compile (optimizer = 'adam' , loss = 'mse' ) autoencoder.fit(data_tensor, data_tensor, epochs = 50 , batch_size = 32 , shuffle = True ) # Calculate the reconstruction error for each data point reconstructions = autoencoder.predict(data_tensor) mse = tf.reduce_mean(tf.square(data_tensor - reconstructions), axis = 1 ) anomaly_scores = pd.Series(mse.numpy(), name = 'anomaly_scores' ) anomaly_scores.index = data_converted.index |
We define the autoencoder model and fit it to the cleaned data. The autoencoder is used to identify any deviations from the regular patterns in the data that are learned from the data. To reduce the mean squared error between the input and the output, the model is trained. The reconstruction error for each data point is determined using the trained model and is utilized as an anomaly score.
Python3
threshold = anomaly_scores.quantile( 0.99 ) anomalous = anomaly_scores > threshold binary_labels = anomalous.astype( int ) precision, recall,\ f1_score, _ = precision_recall_fscore_support( binary_labels, anomalous, average = 'binary' ) |
Here, we define an anomaly detection threshold and assess the model’s effectiveness using precision, recall, and F1 score. Recall is the ratio of true positives to all real positives, whereas precision is the ratio of genuine positives to all projected positives. The harmonic mean of recall and accuracy is the F1 score.
Python3
test = data_converted[ 'value' ].values predictions = anomaly_scores.values print ( "Precision: " , precision) print ( "Recall: " , recall) print ( "F1 Score: " , f1_score) |
Output:
Precision: 1.0 Recall: 1.0 F1 Score: 1.0
Anomaly Detection in Time Series Data
Anomaly detection is the process of identifying data points or patterns in a dataset that deviate significantly from the norm. A time series is a collection of data points gathered over some time. Anomaly detection in time series data may be helpful in various industries, including manufacturing, healthcare, and finance. Anomaly detection in time series data may be accomplished using unsupervised learning approaches like clustering, PCA (Principal Component Analysis), and autoencoders.