NPBAD Basics¶

Analyzing Patterns and Anomalies with NPBAD¶

This tutorial utilizes the main takeaways from the research paper: Pattern-Based Anomaly Detection in a Network of Multivariate Time Series

To explore the basic concepts, we’ll use the ipbad function to find interesting patterns and anomalies and demonstrate these concepts with two different time series datasets:

The NYC taxi passengers dataset
The server machine dataset

ipbad is described in detail in the paper and contains a variety of optimalisations to minimize runtime and improve comprehension. Steps include: (i) transforming a time series to a set of discrete sequences; (2) find frequent and cohesive patterns that best compress the data; (3) create an anomaly scores based on pattern occurrences.

Getting Started¶

Let’s import the packages that we’ll need to load, analyze, and plot the data:

!git clone https://len_feremans@bitbucket.org/len_feremans/pbad_network_private.git

Cloning into 'pbad_network_private'...
remote: Enumerating objects: 573, done.
remote: Counting objects: 100% (573/573), done.
remote: Compressing objects: 100% (517/517), done.
remote: Total 573 (delta 262), reused 146 (delta 38), pack-reused 0
Receiving objects: 100% (573/573), 314.46 MiB | 6.44 MiB/s, done.
Resolving deltas: 100% (262/262), done.

!cd pbad_network_private && pip install . -q

  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
     |████████████████████████████████| 2.5 MB 5.3 MB/s 
  Building wheel for npbad (setup.py) ... done

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import json
from tqdm import tqdm
import os

from npbad.utils import load_numenta_data, cart_product

from npbad.visualisation import plot_data, plot_discrete, plot_anomalies, plot_patterns
from npbad.preprocess_timeseries import min_max_norm
from npbad.symbolic_representation import create_segments, create_windows
from npbad.symbolic_representation import discretise_segments_equal_distance_bins_global

from npbad.ipbad.main import save_transactions
from npbad.ipbad.pattern_mining_petsc import mine_patterns_and_create_embedding
from npbad.ipbad.minimum_description_length import post_process_mdl

from npbad.ipbad.main import run_ipbad
from npbad.eval_methods import eval_best_f1, eval_recall_at_k

What is a pattern?¶

Time series patterns are repeated subsequences found within a longer time series. In contrast to motifs we compute exact occurrences after discretisation, have varying length patterns and allow for a number of gaps relative to the length of pattern.

Loading the NYC Taxi Passengers Dataset¶

First, we’ll download historical data that represents the half-hourly average of the number of NYC taxi passengers over 75 days in the Fall of 2014 available in the The Numenta Anomaly Benchmark. We also download the labeled timestamps for the Taxi Dataset saved in json.

!wget -nv https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv
!wget -nv https://raw.githubusercontent.com/numenta/NAB/master/labels/combined_labels.json

2022-05-23 12:06:26 URL:https://raw.githubusercontent.com/numenta/NAB/master/data/realKnownCause/nyc_taxi.csv [265771/265771] -> "nyc_taxi.csv" [1]
2022-05-23 12:06:26 URL:https://raw.githubusercontent.com/numenta/NAB/master/labels/combined_labels.json [7015/7015] -> "combined_labels.json" [1]

We extract that data and insert it into a pandas dataframe, making sure the timestamps are stored as datetime objects and the values are of type float64. The labels or anomalies are just a list of numpy datetime objects. In this we use the utility function npbad.utils.load_numenta_data to do this.

df, anomalies = load_numenta_data('nyc_taxi.csv', 'combined_labels.json', key='realKnownCause/nyc_taxi.csv')
display(df)

fill_missing: adding (relative) +0.0% more values (from 10320 to 10320)
loading data. start: 2014-07-01 00:00:00. end: 2015-01-31 23:30:00. interval: 30.0m
anomalies: #5. timestamps: [numpy.datetime64('2014-11-01T19:00:00'), numpy.datetime64('2014-11-27T15:30:00'), numpy.datetime64('2014-12-25T15:00:00'), numpy.datetime64('2015-01-01T01:00:00'), numpy.datetime64('2015-01-27T00:00:00')]

Visualizing the Taxi Dataset¶

We can visualise the time series and anomalies using matplotlib or using the utility function npbad.visualisation.plot_data to show both the time series and labelled anomalies.

fig, axs = plt.subplots(1, figsize=(20,3))
df_sample = df[df.time > np.datetime64('2014-10-15')]
plot_data(axs, df_sample, anomalies)

Symbolic representation¶

Before we find frequent patterns we require a symbolic representation or sequence database. Therefore, we first normalise the time series and apply a sliding window

ts_i = min_max_norm(df)
ts_i_sample = ts_i[ts_i.time > np.datetime64('2014-12-20')]
#symbolic representation
windows = create_windows(ts_i_sample, interval=24, stride=24) 
segments = create_segments(ts_i_sample, windows)
print(f"Total windows: {len(windows)}. First: {windows[0]}. Last: {windows[-1]}. Length segment: {len(segments[0])}")

Total windows: 42. First: (Timestamp('2014-12-20 00:30:00'), Timestamp('2014-12-21 00:30:00')). Last: (Timestamp('2015-01-30 00:30:00'), Timestamp('2015-01-31 00:30:00')). Length segment: 48

Next we discretise each window by taking average values every x steps, i.e. apply the Piecewise Aggregate Approxmiation, and discretise the average values into k equal-length bins.

segments_d = discretise_segments_equal_distance_bins_global(segments, no_symbols=10, no_bins=5)
print(segments_d[0])

[2 1 0 1 2 2 2 3 2 3]

We can visualize the discrete representation directly since binning is global

fig, axs = plt.subplots(2, figsize=(30,6), sharex=True)
anomalies = [an for an in anomalies if an > np.datetime64('2014-12-01')]
plot_data(axs[0], ts_i_sample, anomalies)
plot_discrete(axs[1], segments_d, windows, interval_width=24, stride=24, no_symbols=10, no_bins=5)
plot_anomalies(axs[1], ts_i_sample, anomalies)

discrete plot len w 420, len s420

Manualy finding a pattern-based outlier¶

Given a set of discrete sequence we can report an anomaly as a sequence that occurs infrequently or vice versa if is not covered by frequent patterns. First we identify a pattern manually.

def matches(pattern, segment):
  ''' if all symbols pattern occur in the same order as in window'''
  for x in pattern:
    if x in segment:
      segment = segment[segment.index(x):]
    else:
      return False
  return True

frequent_pattern = [0,1,2,2,2,2,2]
occurrences1 = [1 if matches(frequent_pattern,segment.tolist()) else 0 for segment in segments_d]
infrequent_pattern = [3,2,0,0,1]
occurrences2 = [1 if matches(infrequent_pattern,segment.tolist()) else 0 for segment in segments_d]

fig, axs = plt.subplots(3, figsize=(30,6), sharex=True)
plot_discrete(axs[0], segments_d, windows, interval_width=24, stride=24, no_symbols=10, no_bins=5)
plot_anomalies(axs[0], ts_i_sample, anomalies)
axs[1].bar([window[0] + np.timedelta64(6,'h') for window in windows], occurrences1)
axs[1].set_ylabel(f'{frequent_pattern}')
axs[2].bar([window[0] + np.timedelta64(6,'h') for window in windows], occurrences2, color='green')
axs[2].set_ylabel(f'{infrequent_pattern}')

discrete plot len w 420, len s420

Text(0, 0.5, '[3, 2, 0, 0, 1]')

From the above plot we see that the first frequent patterns matches 40/42 windows and that the two remaining windows are in fact anomalous. The second infrequent pattern matches only a single window which is also anomalous. If we would create an embedding for each window and store if either pattern occurs would not be hard to decide if an anomaly occurs. In the following section we will automatically discover patterns and report anomalies.

Finding patterns using NPBAD¶

First we find the top-1000 frequent sequential patterns with a constraint on relative duration. For instance, with a relative duration we allow no gaps and with a relative duration of 1.2 an occurrence of pattern of length 5 can have at most 2 gaps (or 5*1.2 -5=2). We also filter sequential pattern with a length smaller than 4.

save_transactions('./temp/sequences.txt', segments_d)
patterns_df, embedding = mine_patterns_and_create_embedding('./temp/sequences.txt', no_transactions = len(segments_d), 
                                                            top_k=10000, min_size=4, max_size=10, 
                                                            duration=1.2, 
                                                            patterns_fname='./temp/patterns.txt', embedding_fname='./temp/embedding.txt', verbose=True)
print(f"Total patterns before MDL: {patterns_df.shape[0]}")
display(patterns_df)

['java', '-Xmx16G', '-cp', '/usr/local/lib/python3.7/dist-packages/npbad/ipbad/../../lib/petsc-miner-pbad-0.1.0-with-dependencies.jar', 'be.uantwerpen.mining.MineTopKSequentialPatternsNoDiscretisation', '-input_transactions', './temp/sequences.txt', '-min_size', '4', '-max_size', '10', '-duration', '1.2', '-top_k', '10000', '-output_patterns', './temp/patterns.txt', '-output_occurrences', './temp/embedding.txt']
saved embedding to ./temp/embedding.txt elapsed 1.2886779308319092s
loading #749 patterns. Min/Max rsupport :0.024/0.833
loading embedding #42. Min/Max patterns in segment :25/79. First: [(0, 0.8333333333333334), (1, 0.8095238095238095), (2, 0.7142857142857143), (3, 0.7142857142857143), (4, 0.6904761904761905), (18, 0.30952380952380953), (21, 0.2857142857142857), (22, 0.2857142857142857), (23, 0.2857142857142857), (24, 0.2619047619047619), (25, 0.2619047619047619), (26, 0.2619047619047619), (28, 0.23809523809523808), (29, 0.23809523809523808), (30, 0.23809523809523808), (32, 0.21428571428571427), (33, 0.21428571428571427), (36, 0.21428571428571427), (38, 0.21428571428571427), (39, 0.21428571428571427), (40, 0.21428571428571427), (46, 0.19047619047619047), (48, 0.19047619047619047), (56, 0.16666666666666666), (61, 0.16666666666666666), (70, 0.14285714285714285), (74, 0.14285714285714285), (75, 0.14285714285714285), (76, 0.14285714285714285), (77, 0.14285714285714285), (79, 0.14285714285714285), (82, 0.14285714285714285), (83, 0.14285714285714285), (89, 0.11904761904761904), (90, 0.11904761904761904), (115, 0.09523809523809523), (116, 0.09523809523809523), (147, 0.09523809523809523), (148, 0.09523809523809523), (170, 0.07142857142857142), (171, 0.07142857142857142), (172, 0.07142857142857142), (173, 0.07142857142857142), (190, 0.07142857142857142), (206, 0.07142857142857142), (208, 0.07142857142857142), (210, 0.07142857142857142), (211, 0.07142857142857142), (216, 0.07142857142857142), (218, 0.07142857142857142), (231, 0.07142857142857142), (233, 0.07142857142857142), (234, 0.07142857142857142), (236, 0.07142857142857142), (258, 0.047619047619047616), (262, 0.047619047619047616), (270, 0.047619047619047616), (276, 0.047619047619047616), (290, 0.047619047619047616), (292, 0.047619047619047616), (293, 0.047619047619047616), (314, 0.047619047619047616), (316, 0.047619047619047616), (325, 0.047619047619047616), (326, 0.047619047619047616), (351, 0.047619047619047616), (353, 0.047619047619047616), (354, 0.047619047619047616), (355, 0.047619047619047616), (358, 0.047619047619047616), (359, 0.047619047619047616), (377, 0.047619047619047616), (381, 0.047619047619047616), (383, 0.047619047619047616), (388, 0.047619047619047616), (389, 0.047619047619047616), (390, 0.047619047619047616), (391, 0.047619047619047616), (400, 0.047619047619047616)]. Last: [(0, 0.8333333333333334), (1, 0.8095238095238095), (2, 0.7142857142857143), (3, 0.7142857142857143), (4, 0.6904761904761905), (5, 0.6666666666666666), (6, 0.6428571428571429), (7, 0.6428571428571429), (8, 0.5952380952380952), (9, 0.5476190476190477), (10, 0.5238095238095238), (11, 0.5238095238095238), (12, 0.40476190476190477), (21, 0.2857142857142857), (22, 0.2857142857142857), (23, 0.2857142857142857), (24, 0.2619047619047619), (25, 0.2619047619047619), (26, 0.2619047619047619), (28, 0.23809523809523808), (29, 0.23809523809523808), (30, 0.23809523809523808), (35, 0.21428571428571427), (51, 0.19047619047619047), (55, 0.16666666666666666), (57, 0.16666666666666666), (64, 0.16666666666666666), (69, 0.14285714285714285), (94, 0.11904761904761904), (95, 0.11904761904761904), (99, 0.11904761904761904), (100, 0.11904761904761904), (101, 0.11904761904761904), (102, 0.11904761904761904), (103, 0.11904761904761904), (108, 0.11904761904761904), (112, 0.11904761904761904), (149, 0.09523809523809523), (150, 0.09523809523809523), (154, 0.09523809523809523), (184, 0.07142857142857142), (225, 0.07142857142857142), (229, 0.07142857142857142), (235, 0.07142857142857142), (298, 0.047619047619047616), (401, 0.047619047619047616)]
Total patterns before MDL: 749

To improve compresion, we post-process patterns and only keep patterns that compress the data defined using minimum description length which dramatically decreased the number of patterns to improve comprehesion

patterns_df, embedding = post_process_mdl(patterns_df, embedding, segments_d, no_symbols=10, duration=1.2, verbose=False)
print(f"Total patterns after MDL: {patterns_df.shape[0]}")
display(patterns_df)
print(embedding[0])

Total patterns after MDL: 50

[(0, 0.8333333333333334), (1, 0.8095238095238095), (2, 0.7142857142857143), (3, 0.7142857142857143), (4, 0.6904761904761905), (18, 0.30952380952380953), (21, 0.2857142857142857), (22, 0.2857142857142857), (24, 0.2619047619047619), (25, 0.2619047619047619), (26, 0.2619047619047619), (28, 0.23809523809523808), (29, 0.23809523809523808), (30, 0.23809523809523808), (32, 0.21428571428571427), (33, 0.21428571428571427), (38, 0.21428571428571427), (39, 0.21428571428571427), (74, 0.14285714285714285), (75, 0.14285714285714285), (77, 0.14285714285714285), (148, 0.09523809523809523)]

We can visualise each pattern using npbad.visualisation.plot_patterns, which also show the number of patterns in the pattern-based embedding for each window.

plot_patterns(ts_i_sample, anomalies, segments_d, windows, patterns_df, embedding, interval_width=24,stride=24,no_symbols=10)

discrete plot len w 420, len s420

Finding anomalies using NPBAD¶

After we discovered patterns and created an embedding of patterns for each window, we compute an anomaly score using an isolation forest. Alternatively, we use the frequent pattern outlier factor (FPOF) which is the sum of relative support of all matching patterns versus the total number of pattern.

from npbad.anomaly_detection import iso_forest, fpof
#fpof results:
fpof_scores = fpof(embedding, patterns_df.shape[0])
fpof_scores = list(zip(windows, fpof_scores))
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(fpof_scores, anomalies)
print(f"IPBAD-FPOF: F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

#plot
def plot_predicted_anomalies(df, anomalies, an_scores1, t):
    fig, axs = plt.subplots(2, figsize=(30,3), sharex=True)
    plot_data(axs[0], df, anomalies)
    #plot fpof
    timestamps = [window[0] + np.timedelta64(12,'h') for window, score in an_scores1]
    scores1 = [score for window, score in an_scores1]
    axs[1].plot(timestamps, scores1, color='red',alpha=0.5)
    axs[1].axhline(y=t, color='b', linestyle='--')
    axs[1].set_ylabel('FPOF')

plot_predicted_anomalies(ts_i_sample, anomalies, fpof_scores, t)

eval_best_f1: #an_scores: 42, min score:  0.760-max score:  1.000 #anomalies: 3
IPBAD-FPOF: F1@best: 0.4444, precision: 0.3333, recall: 0.6667, threshold: 0.9519, TP: 2, FP: 4, FN:1, TN:35

The previous steps (discretisation, pattern mining and embedding, anomaly detection) can be combined on the full taxi dataset in one call to npbad.ipbad.main.run_ipbad. In this case we set the stride to 1.

df, anomalies = load_numenta_data('nyc_taxi.csv', 'combined_labels.json', key='realKnownCause/nyc_taxi.csv')
prep_par = {'interval': 96, 'stride':1, 'no_symbols': 5, 'no_bins': 5, 'remove_outliers':False} 
parameter = {'use_iso': True, 'use_MDL': True}

#run anomaly detection
an_scores, ts_i, segments_discretised, windows = run_ipbad(
                df=df, interval=prep_par["interval"], stride=prep_par["stride"], 
                no_symbols=prep_par["no_symbols"], no_bins=prep_par["no_bins"], 
                preprocess_outlier=prep_par["remove_outliers"], 
                binning_method='normal',
                use_MDL=parameter['use_MDL'], use_iso=parameter['use_iso'],
                verbose=False)   

#evaluate                                                                          
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_scores, anomalies)
print(f"F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

#plot
plot_predicted_anomalies(df, anomalies, an_scores, t)

fill_missing: adding (relative) +0.0% more values (from 10320 to 10320)
loading data. start: 2014-07-01 00:00:00. end: 2015-01-31 23:30:00. interval: 30.0m
anomalies: #5. timestamps: [numpy.datetime64('2014-11-01T19:00:00'), numpy.datetime64('2014-11-27T15:30:00'), numpy.datetime64('2014-12-25T15:00:00'), numpy.datetime64('2015-01-01T01:00:00'), numpy.datetime64('2015-01-27T00:00:00')]
create anomaly scores (isolation_forest). Number of scores: 5064. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 00:00:00')), 0.4463487956354264), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 01:00:00')), 0.4463487956354264), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 02:00:00')), 0.4339457410193874), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 03:00:00')), 0.4421260164701807), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 04:00:00')), 0.4421260164701807)]
eval_best_f1: #an_scores: 5064, min score:  0.434-max score:  0.551 #anomalies: 5
F1@best: 0.8389, precision: 0.8798, recall: 0.8017, threshold: 0.4807, TP: 388, FP: 53, FN:96, TN:4527

Optimising parameters¶

The performance of anomaly detection is dependent on a good symbolic representation. Therefore we should optimise hyper-parameters. An example using grid search is shown below

results_ipbad = []
grid = cart_product(interval=[110,96,72], no_symbols=[5,6,7], no_bins=[5,6,7])   
for preprocess_parameter in tqdm(grid):
  try:
    an_scores, ts_i, segments_discretised, windows = run_ipbad(df=df, interval=preprocess_parameter["interval"], stride=1, 
                                                               no_symbols=preprocess_parameter["no_symbols"], no_bins=preprocess_parameter["no_bins"], 
                                                               preprocess_outlier=False, binning_method='normal',verbose=False,
                                                               use_MDL=True, use_iso=True)
    precAt10, recAt10 = eval_recall_at_k(an_scores, anomalies, threshold_top_k=50, K=10)
    results_ipbad.append((recAt10, precAt10, preprocess_parameter))
  except Exception as e:
    print(e)
df_results = pd.DataFrame.from_records(results_ipbad, columns=["precision@10", "recall@10", "parameters"])
df_results = df_results.sort_values(by=["recall@10","precision@10"], ascending=False)
display(df_results)

  4%|▎         | 1/27 [00:12<05:21, 12.38s/it]

create anomaly scores (isolation_forest). Number of scores: 5050. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 14:00:00')), 0.42616524372456305), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 15:00:00')), 0.4434734481791104), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 16:00:00')), 0.4434734481791104), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 17:00:00')), 0.4434734481791104), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 18:00:00')), 0.4446216854351137)]

  7%|▋         | 2/27 [00:23<04:57, 11.91s/it]

create anomaly scores (isolation_forest). Number of scores: 5050. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 14:00:00')), 0.4525372950073978), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 15:00:00')), 0.4525372950073978), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 16:00:00')), 0.4525372950073978), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 17:00:00')), 0.4525372950073978), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 18:00:00')), 0.4525372950073978)]

 11%|█         | 3/27 [00:35<04:45, 11.88s/it]

create anomaly scores (isolation_forest). Number of scores: 5050. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 14:00:00')), 0.44800609398808533), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 15:00:00')), 0.44800609398808533), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 16:00:00')), 0.44800609398808533), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 17:00:00')), 0.44800609398808533), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 18:00:00')), 0.45394088377986963)]

 15%|█▍        | 4/27 [00:49<04:45, 12.42s/it]

create anomaly scores (isolation_forest). Number of scores: 5050. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 14:00:00')), 0.4356586872495356), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 15:00:00')), 0.4356586872495356), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 16:00:00')), 0.4356586872495356), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 17:00:00')), 0.4512666884361035), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 18:00:00')), 0.4512666884361035)]

 19%|█▊        | 5/27 [01:03<04:50, 13.19s/it]

create anomaly scores (isolation_forest). Number of scores: 5050. First 5: [((Timestamp('2014-07-01 00:00:00'), Timestamp('2014-07-05 14:00:00')), 0.4384550040007865), ((Timestamp('2014-07-01 01:00:00'), Timestamp('2014-07-05 15:00:00')), 0.4384550040007865), ((Timestamp('2014-07-01 02:00:00'), Timestamp('2014-07-05 16:00:00')), 0.4384550040007865), ((Timestamp('2014-07-01 03:00:00'), Timestamp('2014-07-05 17:00:00')), 0.4384550040007865), ((Timestamp('2014-07-01 04:00:00'), Timestamp('2014-07-05 18:00:00')), 0.4384550040007865)]

Multivariate or entity-based anomaly detection¶

As a second use-case we detect anomalies in a multivariate dataset containing multiple time series, or sensors, related to a server, such as CPU and RAM usage.

Loading the SMD dataset¶

The Server Machine Dataset (SMD) is split into a train and test part, where the test part consist of the last 50% of sensor values. For model training we concatenate both datasets and use all data for frequent pattern mining thereby ignoring labels until evaluation. We load the first machine from SMD, however, the complete dataset is available here. In SMD each machine consist of 38 different sensors.

data_folder = './pbad_network_private/data/SMD_sample/'
machine = 'machine-1-1'
machine_1_test = f'{machine}_test.csv'
machine_1_train = f'{machine}_train.csv'
machine_1_test_labels = f'{machine}_test_label.csv'
machine_1_interpretation_label = f'{machine}_interpretation_label.csv'

#Load training and test data. Original data has only 'time_id' so adding actual timestamps
#We concanate both datasets, since test data is after training. 
#We only have labels for the test part and thus only predict anomalies during the test period. 
df_train = pd.read_csv(data_folder + machine_1_train) #i.e. contains metric-1,...,metric-38,time_id,time
df_train['time'] = pd.to_datetime(df_train['time'])
df_train = df_train.drop(columns=['time_id'])
df_test = pd.read_csv(data_folder + machine_1_test) #i.e. contains metric-1,...,metric-38,time_id,time
df_test['time'] = pd.to_datetime(df_test['time'])
df_test = df_test.drop(columns=['time_id'])
#Test is right after train, so add date
diff_seconds = int((df_train['time'].max() - df_test['time'].min()) / np.timedelta64(1, 's'))
df_test['time'] = df_test['time'].apply(lambda x: x + np.timedelta64(diff_seconds,'s'))
df = pd.concat([df_train, df_test])    
labels_df = pd.read_csv(data_folder + machine_1_test_labels) #i.e. contains label,time_id,time where label==1
labels_df['time'] = pd.to_datetime(labels_df['time'])
#also for labels
labels_df['time'] = labels_df['time'].apply(lambda x: x + np.timedelta64(diff_seconds,'s'))
anomalies = [row['time'] for idx, row in labels_df.iterrows()]
print(f'No anomalies: {len(anomalies)}. First: {anomalies[0]}')
pd.options.display.float_format = '{:,.3f}'.format
display(df)

No anomalies: 2694. First: 2022-01-31 18:47:00

Visualising the SMD dataset¶

from npbad.visualisation import plot_mv_data
plot_mv_data('machine-1-1', df, anomalies)

Pre-processing the multivariate dataset¶

In the raw data we find many correlated time series. For anomaly detection we pre-process the time series and remove correlated time series. The function pre_process_mv_df does:

Remove time series correlated where pearson correlation coefficient > 0.95
Remove straight lines
Cluster the remaining time series correlated by a pearson correlation coefficient > 0.85

After pre-processing we keep only 16 of the total 38 time series for the first device.

from npbad.preprocess_timeseries import pre_process_mv_df

df2, components = pre_process_mv_df(df)
plot_mv_data('machine-1-1', df2, anomalies)

100%|██████████| 703/703 [00:01<00:00, 649.06it/s]

find_similarities_pearson_simple: elapsed: 1.0972445011138916s
found 4 components: [{1, 2, 3, 34, 35, 18, 19, 20, 21, 24, 27, 30, 31}, {10, 15}, {25, 23}, {32, 33, 28}]
Removing: Before: 38->16 straight: ['metric-5', 'metric-8', 'metric-17', 'metric-18', 'metric-37', 'metric-38'] and perfect correlated: ['metric-3', 'metric-4', 'metric-19', 'metric-20', 'metric-21', 'metric-22', 'metric-25', 'metric-28', 'metric-31', 'metric-32', 'metric-35', 'metric-36', 'metric-16', 'metric-26', 'metric-33', 'metric-34']

100%|██████████| 120/120 [00:00<00:00, 702.37it/s]

find_similarities_pearson_simple: elapsed: 0.1827538013458252s
found 3 components: [{9, 4}, {8, 7}, {12, 13, 14}]
[['metric-14', 'metric-9'], ['metric-12', 'metric-13'], ['metric-24', 'metric-27', 'metric-29']]

Finding anomalies using an isolation forest¶

from npbad.baselines.iso_forest import run_iso_forest_mv
preprocess_parameter = {'interval': 72, 'stride':1, 'no_symbols': 4, 'no_bins': 5,  'preprocess_outlier': False}

#run iso forest
an_scores = run_iso_forest_mv(df2, interval=preprocess_parameter['interval'], 
                                   stride=preprocess_parameter['stride'], 
                              preprocess_outlier=preprocess_parameter['preprocess_outlier'])

#evaluate (using only test data which has labelled anomalies)
an_score_second_half = []
for window, score in an_scores: 
  if window[0] > df_test['time'].min():
        an_score_second_half.append((window,score))

(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies)
print(f"ISO F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

#without point-adjust:
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies, point_adjust=False)
print(f"ISO W/O F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

run_iso_forest_mv

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (878). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

eval_best_f1: #an_scores: 403, min score:  0.423-max score:  0.475 #anomalies: 2694
ISO F1@best: 1.0000, precision: 1.0000, recall: 1.0000, threshold: 0.4439, TP: 211, FP: 0, FN:0, TN:192
eval_best_f1: #an_scores: 403, min score:  0.423-max score:  0.475 #anomalies: 2694
ISO W/O F1@best: 0.8138, precision: 0.9273, recall: 0.7251, threshold: 0.4387, TP: 153, FP: 12, FN:58, TN:180

Find anonalies using NPBAD¶

from npbad.ipbad.main import run_ipbad_mv
an_scores = run_ipbad_mv(df2,  interval=preprocess_parameter['interval'], stride=preprocess_parameter['stride'], 
                              no_symbols=preprocess_parameter['no_symbols'], no_bins=preprocess_parameter['no_bins'], 
                              preprocess_outlier=preprocess_parameter['preprocess_outlier'], 
                              use_MDL=True, binning_method='normal', use_iso=True)
#eval after time
an_score_second_half = []
for window, score in an_scores:
  if window[0] > df_test['time'].min():
      an_score_second_half.append((window,score))
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies)
print(f"IPBAD F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

#without point-adjust:
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies, point_adjust=False)
print(f"IPBAD W/O F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN:{TN}")

Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns
Warning: ipbad: fewer than 5 patterns

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (878). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 31.217990159988403s
eval_best_f1: #an_scores: 403, min score:  0.421-max score:  0.549 #anomalies: 2694
IPBAD F1@best: 1.0000, precision: 1.0000, recall: 1.0000, threshold: 0.4980, TP: 211, FP: 0, FN:0, TN:192
eval_best_f1: #an_scores: 403, min score:  0.421-max score:  0.549 #anomalies: 2694
IPBAD W/O F1@best: 0.7121, precision: 0.6040, recall: 0.8673, threshold: 0.4342, TP: 183, FP: 120, FN:28, TN:72

plot_mv_data('machine-1-1', df2, anomalies, an_scores_computed=an_score_second_half, threshold=t)

Optimising parameters¶

We can use random search to find optimal parameters.

We remark that for parameter optimalisation we prefer to set point-adjust to False. We determine that an anomaly is detected if a window contains the anomaly and the anomaly score is higher than a certain threshold. Using point-adjust we also consider other (overlapping) windows and check if one of those windows has a high anomaly score. Next, we report a True Possitive if a neighbouring window with a higher anomaly score reports that anomaly, even if the current window has an anomaly score below the threshold. However, this is an (overly) optimistic evaluation protocol.

import random
grid={"interval": [24,48,72,96],
      "stride": [6],
      "no_symbols": list(range(4,20,2)), 
      "no_bins" : list(range(3,20,2)) + [50,100],
      "preprocess_outlier": [True, False], 
      "use_MDL": [True],
      "binning_method": ['normal'],
      "use_iso": [True]
}

no_iter = 5
best_parameters = None
best_metrics = None
best_an_scores = None
while no_iter > 0:
    random_parameter = {}
    for key, options in grid.items():
        idx = random.randint(0,len(options)-1)
        random_parameter[key] = options[idx]
    try:
        an_scores = run_ipbad_mv(df2,  **random_parameter)
        an_score_second_half = []
        for window, score in an_scores:
            if window[0] > df_test['time'].min():
                an_score_second_half.append((window,score))
        (f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies, point_adjust=False)
        print(f"Parameters: {random_parameter},\n F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}")
        no_iter = no_iter - 1
        if best_metrics is None or f1 > best_metrics[0]:
            best_metrics =   (f1,prec,rec,TP,FP,FN,TN, t)
            best_parameters = random_parameter
            best_an_scores = an_scores
    except Exception as e:
        print(e)
print('#### Best result ####')
(f1,prec,rec,TP,FP,FN,TN, t) = best_metrics
print(f"Parameters: {best_parameters}")
print(f"F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}, TN: {TN}")
#best parameters
#Parameters: {'interval': 96, 'stride': 6, 'no_symbols': 12, 'no_bins': 50, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
#F1@best: 0.8451, precision: 0.8333, recall: 0.8571, threshold: 0.4618, TP: 30, FP: 6, FN:5

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (147). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 39.26453971862793s
eval_best_f1: #an_scores: 67, min score:  0.439-max score:  0.508 #anomalies: 2694
Parameters: {'interval': 72, 'stride': 6, 'no_symbols': 10, 'no_bins': 11, 'preprocess_outlier': True, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
 F1@best: 0.6863, precision: 0.5224, recall: 1.0000, threshold: 0.4393, TP: 35, FP: 32, FN:0

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (155). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 16.383702993392944s
eval_best_f1: #an_scores: 75, min score:  0.434-max score:  0.484 #anomalies: 2694
Parameters: {'interval': 24, 'stride': 6, 'no_symbols': 6, 'no_bins': 13, 'preprocess_outlier': True, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
 F1@best: 0.5577, precision: 0.3867, recall: 1.0000, threshold: 0.4341, TP: 29, FP: 46, FN:0

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (147). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 36.10798454284668s
eval_best_f1: #an_scores: 67, min score:  0.461-max score:  0.519 #anomalies: 2694
Parameters: {'interval': 72, 'stride': 6, 'no_symbols': 14, 'no_bins': 7, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
 F1@best: 0.7292, precision: 0.5738, recall: 1.0000, threshold: 0.4729, TP: 35, FP: 26, FN:0

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (147). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 48.92760396003723s
eval_best_f1: #an_scores: 67, min score:  0.431-max score:  0.558 #anomalies: 2694
Parameters: {'interval': 72, 'stride': 6, 'no_symbols': 16, 'no_bins': 5, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
 F1@best: 0.8571, precision: 0.7857, recall: 0.9429, threshold: 0.4945, TP: 33, FP: 9, FN:2

/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_iforest.py:293: UserWarning: max_samples (1500) is greater than the total number of samples (147). max_samples will be set to n_samples for estimation.
  % (self.max_samples, n_samples)

run_mv_hetero elapsed 45.18245053291321s
eval_best_f1: #an_scores: 67, min score:  0.463-max score:  0.502 #anomalies: 2694
Parameters: {'interval': 72, 'stride': 6, 'no_symbols': 12, 'no_bins': 5, 'preprocess_outlier': True, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True},
 F1@best: 0.6863, precision: 0.5224, recall: 1.0000, threshold: 0.4632, TP: 35, FP: 32, FN:0
#### Best result ####
Parameters: {'interval': 72, 'stride': 6, 'no_symbols': 16, 'no_bins': 5, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': True}
F1@best: 0.8571, precision: 0.7857, recall: 0.9429, threshold: 0.4945, TP: 33, FP: 9, FN:2, TN: 23

Interpretable multivariate anomaly detection based on the pattern-based outlier¶

par_fpof = {'interval': 6, 'stride': 6, 'no_symbols': 10, 'no_bins': 10, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 
        'use_iso': False}
df2_sample = df2#df2[df2['time'] > np.datetime64('2022-02-01')]
an_scores = run_ipbad_mv(df2_sample,  **par_fpof)
an_score_second_half = []
for window, score in an_scores:
  if window[0] > df_test['time'].min():
    an_score_second_half.append((window,score))
(f1,prec,rec,TP,FP,FN,TN, t) = eval_best_f1(an_score_second_half, anomalies, point_adjust=False)
print(f"Parameters: {par_fpof},\n F1@best: {f1:.4f}, precision: {prec:.4f}, recall: {rec:.4f}, threshold: {t:.4f}, TP: {TP}, FP: {FP}, FN:{FN}")  
#was: F1@best: 0.4528, precision: 0.3077, recall: 0.8571, threshold: 0.9876, TP: 12, FP: 27, FN:2

#WASF1@best: 0.5000, precision: 0.6000, recall: 0.4286, threshold: 0.9964, TP: 6, FP: 4, FN:8

run_mv_hetero elapsed 20.057478666305542s
eval_best_f1: #an_scores: 78, min score:  0.978-max score:  1.000 #anomalies: 2694
Parameters: {'interval': 6, 'stride': 6, 'no_symbols': 10, 'no_bins': 10, 'preprocess_outlier': False, 'use_MDL': True, 'binning_method': 'normal', 'use_iso': False},
 F1@best: 0.5217, precision: 0.6667, recall: 0.4286, threshold: 0.9933, TP: 6, FP: 3, FN:8

Plot of discrete representation and known anomalies per sensor¶

In SMD we have additional data on which sensors caused which anomalies. First we load this data.

from matplotlib.font_manager import ttfFontProperty
#SMD contains interpretation label, which shows which sensors caused the 
df_int_labels = pd.read_csv(data_folder + machine_1_interpretation_label) 
df_int_labels = pd.merge(df_int_labels, labels_df, how='left', left_on='from_time_id', right_on='time_id')
df_int_labels = df_int_labels.rename(columns={"time":"from_time"})
df_int_labels['to_time'] = df_int_labels['to_time_id'].apply(lambda i: np.datetime64('2022-01-31 18:47:00') + np.timedelta64(i - 15849,'m'))

display(df_int_labels)
def get_anomalies_col(anomalies, col):
   maching_anomalies = []
   for anomaly in anomalies:
      found = False
      filter_df = df_int_labels[(df_int_labels['from_time'] <= anomaly) & (df_int_labels['to_time'] >= anomaly)]
      for idx, row in filter_df.iterrows():
         if '-C' in col:
           col=col[0:col.index('-C')]
         if col in row['dimensions']:
           found = True
           break
      if found:
        maching_anomalies.append(anomaly)
   return maching_anomalies

Next, we show each time series, the anomalies associated with that time series and the corresponding discretised time series.

no_dim = len(df2.columns)
fig, axs = plt.subplots(no_dim, figsize=(30,30), sharex=True)
for i, col in enumerate(df2.columns):
  if col == 'time':
    continue
  axs[i].plot(df2_sample['time'], df2_sample[col], color='blue',alpha=0.5)
  #show anomalies for known sensors
  anomalies_dim = get_anomalies_col(anomalies, col)
  axs[i].scatter(anomalies_dim, [1.0 for anomaly in anomalies_dim], color='red', marker='X', alpha=0.5, s=5)
  #discretise
  ts_i = df2_sample[['time',col]].copy()
  ts_i = ts_i.rename(columns={col:'average_value'})
  windows = create_windows(ts_i, interval=par_fpof['interval'], stride=par_fpof['stride']) 
  segments = create_segments(ts_i, windows)
  segments_d = discretise_segments_equal_distance_bins_global(segments, par_fpof['no_bins'], par_fpof['no_symbols'])
  print(windows[0],segments_d[0],len(windows))
  #plot discretise
  axs[i].text(df2_sample['time'].min() - np.timedelta64(1,'D'), 0, f'{col}', fontsize=12)
  plot_discrete(axs[i], segments_d, windows, interval_width=par_fpof['interval'], stride=par_fpof['stride'], no_symbols=par_fpof['no_symbols'], no_bins=par_fpof['no_bins'])

(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [9 9 9 9 9 9 9 9 9 9] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [3 3 3 3 3 3 3 3 3 3] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [1 1 1 1 1 1 1 1 1 1] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [4 4 3 2 1 2 1 1 1 1] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580
(Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')) [0 0 0 0 0 0 0 0 0 0] 158
discrete plot len w 1580, len s1580

FPOF score for each sensor¶

Next, we compute the Frequent-pattern Outlier factor for each time series individually. The FPOF anomaly score often has a lower accuracy than using an isolation forest. However, the anomaly score is easy to interpret, as it is just the ratio of the number of matching patterns in each window versus all patterns. To be fully technically correct, the frequent-pattern outlier factor is the ratio of the the weighted sum of relative support of each matching pattern versus the total number of patterns.

In the first plot, we render each time series of the first device in de SMD dataset, the discretised time series and the computed anomaly score using FPOF. We skip time series that have no patterns, i.e. time series that are mostly flat after discretisation.

from npbad.visualisation import *

#show FPOF score each dimension
data = []
columns = [col for col in df2_sample.columns if col != 'time']

fig, axs = plt.subplots(len(columns), figsize=(30,30), sharex=True)

records = []
for i, col in enumerate(columns):    
    ts_i = df2_sample[['time', col]].copy()
    ts_i = ts_i.rename(columns={col: 'average_value'})
    print(f'run-single(col={col})')
    result = run_ipbad(ts_i, interval=par_fpof['interval'], stride=par_fpof['stride'], 
                                                               no_symbols=par_fpof['no_symbols'], no_bins=par_fpof['no_bins'], 
                                                               preprocess_outlier=par_fpof['preprocess_outlier'], 
                                                               use_MDL=par_fpof['use_MDL'], binning_method=par_fpof['binning_method'], 
                                                               use_iso=False, verbose=False, return_patterns=True) #RETURN_PATTERN=TRUE
    if len(result) == 4:
        an_scores, ts_i, segments_discretised, windows = result
        records.append([an_scores, ts_i, segments_discretised, windows, None, None])
    else:
        an_scores, ts_i, segments_discretised, windows, patterns_df, embedding = result
        records.append([ an_scores, ts_i, segments_discretised, windows, patterns_df, embedding]) 
    #plot_ts 
    axs[i].plot(df2_sample['time'], df2_sample[col], color='blue',alpha=0.5)
    #show anomalies for known sensors
    anomalies_dim = get_anomalies_col(anomalies, col)
    axs[i].scatter(anomalies_dim, [1.0 for anomaly in anomalies_dim], color='red', marker='X', alpha=0.5, s=5)
    #plot discretise
    plot_discrete(axs[i], segments_discretised, windows, 
                  interval_width=par_fpof['interval'], stride=par_fpof['stride'], no_symbols=par_fpof['no_symbols'], no_bins=par_fpof['no_bins'])
    #plot an_scores
    if len(result) != 6:
       continue
    timestamps = [average_timestamps(window[0],window[1]) for window, score in an_scores]
    scores = [score for window, score in an_scores]
    if (max(scores) - min(scores)) !=0:
      scores = [(score - min(scores))/(max(scores) - min(scores)) for score in scores]
    axs[i].plot(timestamps, scores, color='orange', marker='X', alpha=0.5)
    #start = pd.to_datetime(df2_sample['time'].min()) - np.timedelta64(1.5,'D')
    axs[i].text(timestamps[0] - np.timedelta64(2,'D'), 0, f'{col}', fontsize=12)

run-single(col=metric-1)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9228256431196407), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.9228256431196407), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.9228256431196407), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.9228256431196407), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9228256431196407)]
discrete plot len w 1580, len s1580
run-single(col=metric-2)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9145569620253164), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.9145569620253164), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.9145569620253164), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.9145569620253164), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9145569620253164)]
discrete plot len w 1580, len s1580
run-single(col=metric-6)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.8349433710859427), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.8349433710859427), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.8349433710859427), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.8349433710859427), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.8349433710859427)]
discrete plot len w 1580, len s1580
run-single(col=metric-7)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9525316455696202), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.9525316455696202), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.9525316455696202), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.9525316455696202), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9525316455696202)]
discrete plot len w 1580, len s1580
run-single(col=metric-10)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0)]
discrete plot len w 1580, len s1580
run-single(col=metric-11)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.36867088607594933), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.36867088607594933), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.36867088607594933), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.36867088607594933), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.36867088607594933)]
discrete plot len w 1580, len s1580
run-single(col=metric-15)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9617754830113258), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.9191372418387742), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.95236508994004), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.9372085276482345), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9617754830113258)]
discrete plot len w 1580, len s1580
run-single(col=metric-23)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9843328491388255), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 1.0), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 1.0), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 1.0), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9860967005602822)]
discrete plot len w 1580, len s1580
run-single(col=metric-30)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0007911392405063333)]
discrete plot len w 1580, len s1580
run-single(col=metric-14-C1)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0015822784810126667)]
discrete plot len w 1580, len s1580
run-single(col=metric-9-C1)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0)]
discrete plot len w 1580, len s1580
run-single(col=metric-12-C2)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.9539922103213243), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.9028724440116845), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.9253489126906849), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.8952450503083415), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.9539922103213243)]
discrete plot len w 1580, len s1580
run-single(col=metric-13-C2)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0015822784810126667), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0015822784810126667)]
discrete plot len w 1580, len s1580
run-single(col=metric-24-C3)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0007911392405063333)]
discrete plot len w 1580, len s1580
run-single(col=metric-27-C3)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0007911392405063333)]
discrete plot len w 1580, len s1580
run-single(col=metric-29-C3)
create anomaly scores (fpof). Number of scores: 158. First 5: [((Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-01 06:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 06:00:00'), Timestamp('2022-01-01 12:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 12:00:00'), Timestamp('2022-01-01 18:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-01 18:00:00'), Timestamp('2022-01-02 00:00:00')), 0.0007911392405063333), ((Timestamp('2022-01-02 00:00:00'), Timestamp('2022-01-02 06:00:00')), 0.0007911392405063333)]
discrete plot len w 1580, len s1580

Showing pattern occurrences for each sensor¶

In this plot we show the pattern-based embedding, that is all patterns and each pattern occurrence for each sensor corresponding with the last case.

def remove_redundant_patterns(patterns_df):
    filter = set()
    for idx, pattern_row in patterns_df.iterrows():
       for idx2, pattern_row2 in patterns_df.iterrows():
          if pattern_row['instances'] == pattern_row2['instances'] and pattern_row2['bits_saved'] > pattern_row['bits_saved']:
              filter.add(pattern_row['id'])
    return patterns_df[~patterns_df['id'].isin(filter)]
    
for i, col in enumerate(columns):   
    an_scores, ts_i, segments_discretised, windows, patterns_df, embedding = records[i]
    if patterns_df is None or patterns_df.shape[0] == 0:
      continue
    #display(patterns_df)
    #display(remove_redundant_patterns(patterns_df))
    anomalies_dim = get_anomalies_col(anomalies, col)
    patterns_df = remove_redundant_patterns(patterns_df)
    plot_patterns(ts_i, anomalies_dim, segments_discretised, windows, patterns_df, embedding, 
                  interval_width=par_fpof['interval'],stride=par_fpof['stride'],no_symbols=par_fpof['no_symbols'])

discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580
discrete plot len w 1580, len s1580

!jupyter nbconvert --to html tutorial_npbad.ipynb

[NbConvertApp] Converting notebook tutorial_npbad.ipynb to html
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/nbformat/reader.py", line 18, in parse_json
    nb_dict = json.loads(s, **kwargs)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/jupyter-nbconvert", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/jupyter_core/application.py", line 269, in launch_instance
    return super().launch_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/nbconvertapp.py", line 340, in start
    self.convert_notebooks()
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/nbconvertapp.py", line 510, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/nbconvertapp.py", line 481, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/nbconvertapp.py", line 410, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/exporters/exporter.py", line 179, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/usr/local/lib/python3.7/dist-packages/nbconvert/exporters/exporter.py", line 197, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/usr/local/lib/python3.7/dist-packages/nbformat/__init__.py", line 170, in read
    return reads(buf, as_version, capture_validation_error, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nbformat/__init__.py", line 88, in reads
    nb = reader.reads(s, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nbformat/reader.py", line 72, in reads
    nb_dict = parse_json(s, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/nbformat/reader.py", line 21, in parse_json
    raise NotJSONError(("Notebook does not appear to be JSON: %r" % s)[:77] + "...") from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: 'xP5Z7QV7DXaIkPfWRGhr+3qqzNtiKFBCEgkmGAC...

	average_value	time
0	10844	2014-07-01 00:00:00
1	8127	2014-07-01 00:30:00
2	6210	2014-07-01 01:00:00
3	4656	2014-07-01 01:30:00
4	3820	2014-07-01 02:00:00
...	...	...
10315	24670	2015-01-31 21:30:00
10316	25721	2015-01-31 22:00:00
10317	27309	2015-01-31 22:30:00
10318	26591	2015-01-31 23:00:00
10319	26288	2015-01-31 23:30:00

	pattern	support	rsupport	id
0	[1, 2, 2, 2]	35	0.833333	0
1	[0, 1, 2, 2, 2]	34	0.809524	1
2	[1, 2, 2, 2, 2]	30	0.714286	2
3	[0, 1, 2, 2, 2, 2]	30	0.714286	3
4	[0, 1, 2, 2]	29	0.690476	4
...	...	...	...	...
516	[0, 0, 1, 2, 1, 1, 1, 2, 2, 1]	1	0.023810	744
515	[0, 0, 1, 2, 1, 1, 1, 2, 1]	1	0.023810	745
514	[2, 0, 0, 1, 2, 2, 3, 2]	1	0.023810	746
513	[2, 0, 0, 1, 2, 2, 3, 2, 3]	1	0.023810	747
748	[0, 0, 2, 2, 2, 2, 2, 2, 1]	1	0.023810	748

	precision@10	recall@10	parameters
9	1.2	1.000000	{'interval': 96, 'no_symbols': 5, 'no_bins': 5}
3	1.0	1.000000	{'interval': 110, 'no_symbols': 6, 'no_bins': 5}
11	0.8	1.000000	{'interval': 96, 'no_symbols': 5, 'no_bins': 7}
12	0.6	1.000000	{'interval': 96, 'no_symbols': 6, 'no_bins': 5}
0	0.2	1.000000	{'interval': 110, 'no_symbols': 5, 'no_bins': 5}
2	0.2	1.000000	{'interval': 110, 'no_symbols': 5, 'no_bins': 7}
18	1.0	0.454545	{'interval': 72, 'no_symbols': 5, 'no_bins': 5}
1	0.6	0.375000	{'interval': 110, 'no_symbols': 5, 'no_bins': 6}
19	0.8	0.363636	{'interval': 72, 'no_symbols': 5, 'no_bins': 6}
4	0.4	0.200000	{'interval': 110, 'no_symbols': 6, 'no_bins': 6}
10	0.4	0.200000	{'interval': 96, 'no_symbols': 5, 'no_bins': 6}
5	0.4	0.181818	{'interval': 110, 'no_symbols': 6, 'no_bins': 7}
15	0.4	0.181818	{'interval': 96, 'no_symbols': 7, 'no_bins': 5}
21	0.4	0.181818	{'interval': 72, 'no_symbols': 6, 'no_bins': 5}
6	0.2	0.090909	{'interval': 110, 'no_symbols': 7, 'no_bins': 5}
8	0.2	0.090909	{'interval': 110, 'no_symbols': 7, 'no_bins': 7}
13	0.2	0.090909	{'interval': 96, 'no_symbols': 6, 'no_bins': 6}
14	0.2	0.090909	{'interval': 96, 'no_symbols': 6, 'no_bins': 7}
16	0.2	0.090909	{'interval': 96, 'no_symbols': 7, 'no_bins': 6}
23	0.2	0.090909	{'interval': 72, 'no_symbols': 6, 'no_bins': 7}
26	0.2	0.090909	{'interval': 72, 'no_symbols': 7, 'no_bins': 7}
7	0.0	0.000000	{'interval': 110, 'no_symbols': 7, 'no_bins': 6}
17	0.0	0.000000	{'interval': 96, 'no_symbols': 7, 'no_bins': 7}
20	0.0	0.000000	{'interval': 72, 'no_symbols': 5, 'no_bins': 7}
22	0.0	0.000000	{'interval': 72, 'no_symbols': 6, 'no_bins': 6}
24	0.0	0.000000	{'interval': 72, 'no_symbols': 7, 'no_bins': 5}
25	0.0	0.000000	{'interval': 72, 'no_symbols': 7, 'no_bins': 6}

	from_time_id	to_time_id	dimensions	label	time_id	from_time	to_time
0	15849	16368	['metric-1', 'metric-9', 'metric-10', 'metric-...	1.000	15849	2022-01-31 18:47:00	2022-02-01 03:26:00
1	16963	17517	['metric-1', 'metric-2', 'metric-3', 'metric-4...	1.000	16963	2022-02-01 13:21:00	2022-02-01 22:35:00
2	18071	18528	['metric-1', 'metric-2', 'metric-9', 'metric-1...	1.000	18071	2022-02-02 07:49:00	2022-02-02 15:26:00
3	19367	20088	['metric-1', 'metric-2', 'metric-3', 'metric-4...	1.000	19367	2022-02-03 05:25:00	2022-02-03 17:26:00
4	20786	21195	['metric-1', 'metric-9', 'metric-10', 'metric-...	1.000	20786	2022-02-04 05:04:00	2022-02-04 11:53:00
5	24679	24682	['metric-9', 'metric-13', 'metric-14', 'metric...	1.000	24679	2022-02-06 21:57:00	2022-02-06 22:00:00
6	26114	26116	['metric-9', 'metric-13', 'metric-14', 'metric...	1.000	26114	2022-02-07 21:52:00	2022-02-07 21:54:00
7	27554	27556	['metric-9', 'metric-13', 'metric-14', 'metric...	1.000	27554	2022-02-08 21:52:00	2022-02-08 21:54:00

	pattern	support	rsupport	id	instances	bits_saved
3	[0, 1, 2, 2, 2, 2]	30	0.714286	3	[0, 1, 2, 3, 4, 7, 9, 10, 11, 13, 14, 18, 19, ...	205
7	[0, 0, 1, 2, 2, 2]	27	0.642857	7	[2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, 18, 19...	192
2	[1, 2, 2, 2, 2]	30	0.714286	2	[0, 1, 2, 3, 4, 7, 9, 10, 11, 13, 14, 18, 19, ...	187
9	[0, 0, 1, 2, 2, 2, 2]	23	0.547619	9	[2, 3, 4, 7, 9, 10, 11, 13, 14, 18, 19, 23, 24...	182
1	[0, 1, 2, 2, 2]	34	0.809524	1	[0, 1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, ...	178
6	[0, 0, 1, 2, 2]	27	0.642857	6	[2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, 18, 19...	169
5	[0, 0, 1, 2]	28	0.666667	5	[2, 3, 4, 7, 10, 11, 12, 14, 16, 17, 18, 19, 2...	152
0	[1, 2, 2, 2]	35	0.833333	0	[0, 1, 2, 3, 4, 7, 9, 10, 11, 12, 13, 14, 15, ...	147
8	[2, 2, 2, 2]	25	0.595238	8	[1, 4, 7, 9, 10, 11, 13, 14, 18, 19, 22, 23, 2...	141
4	[0, 1, 2, 2]	29	0.690476	4	[0, 1, 2, 3, 4, 7, 10, 11, 12, 14, 18, 19, 21,...	126
16	[0, 0, 1, 2, 2, 2, 2, 2]	15	0.357143	16	[2, 3, 7, 10, 11, 13, 14, 18, 23, 24, 25, 26, ...	124
14	[0, 1, 2, 2, 2, 2, 2]	16	0.380952	14	[1, 2, 3, 7, 10, 11, 13, 14, 18, 23, 24, 25, 2...	122
15	[1, 2, 2, 2, 2, 2]	16	0.380952	15	[1, 2, 3, 7, 10, 11, 13, 14, 18, 23, 24, 25, 2...	111
13	[2, 2, 2, 2, 2]	16	0.380952	13	[1, 2, 3, 7, 10, 11, 13, 14, 18, 23, 24, 25, 2...	106
12	[0, 0, 2, 2, 2, 2]	17	0.404762	12	[4, 7, 10, 11, 14, 18, 19, 23, 24, 25, 26, 27,...	92
24	[0, 1, 2, 2, 2, 2, 3]	11	0.261905	24	[0, 11, 18, 19, 26, 27, 28, 33, 34, 35, 41]	91
28	[0, 1, 2, 2, 2, 3, 3]	10	0.238095	28	[0, 19, 20, 21, 27, 28, 33, 34, 35, 41]	87
25	[1, 2, 2, 2, 2, 3]	11	0.261905	25	[0, 11, 18, 19, 26, 27, 28, 33, 34, 35, 41]	84
11	[0, 0, 2, 2, 2]	22	0.523810	10	[1, 4, 7, 10, 11, 14, 18, 19, 22, 23, 24, 25, ...	82
23	[0, 1, 2, 2, 2, 3]	12	0.285714	21	[0, 11, 18, 19, 20, 21, 27, 28, 33, 34, 35, 41]	81
29	[1, 2, 2, 2, 3, 3]	10	0.238095	29	[0, 19, 20, 21, 27, 28, 33, 34, 35, 41]	81
19	[1, 2, 2, 2, 1]	13	0.309524	19	[2, 3, 4, 9, 12, 15, 16, 17, 22, 29, 30, 31, 36]	78
30	[2, 2, 2, 3, 3]	10	0.238095	30	[0, 19, 20, 21, 27, 28, 33, 34, 35, 41]	65
18	[1, 1, 2, 2, 2]	13	0.309524	18	[0, 1, 9, 13, 15, 16, 17, 22, 28, 29, 31, 34, 36]	63
26	[2, 2, 2, 2, 3]	11	0.261905	26	[0, 11, 18, 19, 26, 27, 28, 33, 34, 35, 41]	62
21	[1, 2, 2, 2, 3]	12	0.285714	22	[0, 11, 18, 19, 21, 26, 27, 28, 33, 34, 35, 41]	61
10	[0, 2, 2, 2, 2]	22	0.523810	11	[2, 3, 4, 7, 10, 11, 12, 14, 18, 19, 21, 23, 2...	56
17	[2, 2, 2, 1]	15	0.357143	17	[1, 2, 3, 4, 9, 12, 15, 16, 17, 22, 23, 29, 30...	54
108	[2, 2, 2, 3, 3, 3]	5	0.119048	94	[20, 21, 27, 34, 41]	40
101	[0, 1, 2, 2, 2, 2, 3, 3]	5	0.119048	99	[19, 27, 33, 34, 41]	37
27	[0, 2, 2, 2, 2, 2]	10	0.238095	27	[1, 7, 10, 14, 23, 24, 25, 26, 32, 40]	36
75	[2, 1, 0, 1, 2, 2, 2, 2]	6	0.142857	77	[0, 1, 22, 28, 29, 36]	33
110	[1, 2, 2, 2, 2, 3, 2]	5	0.119048	96	[11, 18, 19, 26, 33]	32
111	[2, 2, 2, 2, 3, 2]	5	0.119048	97	[11, 18, 19, 26, 33]	30
99	[0, 2, 2, 2, 2, 3, 3]	5	0.119048	100	[19, 27, 33, 34, 41]	29
32	[1, 0, 1, 2, 2, 2]	9	0.214286	39	[0, 1, 12, 21, 22, 28, 29, 35, 36]	18
77	[2, 1, 0, 1, 2, 2]	6	0.142857	75	[0, 1, 22, 28, 29, 36]	18
105	[2, 2, 2, 2, 1, 1]	5	0.119048	91	[4, 22, 29, 30, 36]	17
33	[1, 0, 1, 2, 2]	9	0.214286	38	[0, 1, 12, 21, 22, 28, 29, 35, 36]	16
39	[2, 0, 1, 2, 2, 2]	9	0.214286	33	[0, 1, 11, 14, 22, 28, 29, 34, 36]	12
153	[2, 1, 0, 1, 2, 2, 2, 2, 1]	4	0.095238	124	[1, 22, 29, 36]	11
22	[2, 2, 2, 3]	12	0.285714	20	[1, 2, 3, 4, 9, 18, 22, 23, 29, 30, 31, 36]	11
130	[2, 1, 0, 2, 2, 2, 2]	4	0.095238	153	[1, 22, 29, 36]	10
76	[2, 1, 0, 2, 2]	6	0.142857	74	[0, 1, 22, 28, 29, 36]	10
115	[0, 1, 2, 2, 3]	4	0.095238	148	[0, 21, 28, 35]	9
40	[2, 1, 1, 2, 2]	9	0.214286	32	[0, 1, 11, 14, 22, 28, 29, 34, 36]	8
134	[1, 0, 0, 1]	4	0.095238	136	[7, 11, 14, 15]	6
133	[1, 1, 1, 2, 2]	4	0.095238	140	[16, 17, 37, 39]	3
136	[1, 0, 0, 1, 2, 2]	4	0.095238	134	[8, 11, 14, 15]	3
186	[0, 0, 1, 2, 1, 2, 2, 2, 1]	3	0.071429	230	[16, 17, 31]	2

	metric-1	metric-2	metric-3	metric-4	metric-5	metric-6	metric-7	metric-8	metric-9	metric-10	...	metric-30	metric-31	metric-32	metric-33	metric-34	metric-35	metric-36	metric-37	metric-38	time
0	0.032	0.039	0.028	0.024	0.000	0.915	0.344	0.000	0.020	0.000	...	0.004	0.030	0.022	0.000	0.000	0.035	0.035	0.000	0.000	2022-01-01 00:00:00
1	0.043	0.049	0.033	0.026	0.000	0.915	0.345	0.000	0.019	0.002	...	0.004	0.030	0.029	0.000	0.000	0.036	0.036	0.000	0.000	2022-01-01 00:01:00
2	0.043	0.035	0.032	0.026	0.000	0.915	0.345	0.000	0.020	0.000	...	0.004	0.026	0.021	0.000	0.000	0.033	0.033	0.000	0.000	2022-01-01 00:02:00
3	0.032	0.029	0.030	0.024	0.000	0.913	0.343	0.000	0.021	0.000	...	0.004	0.030	0.026	0.000	0.000	0.035	0.035	0.000	0.000	2022-01-01 00:03:00
4	0.032	0.019	0.027	0.023	0.000	0.913	0.343	0.000	0.019	0.000	...	0.004	0.027	0.023	0.000	0.000	0.033	0.034	0.000	0.000	2022-01-01 00:04:00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
28474	0.075	0.052	0.048	0.048	0.000	0.908	0.257	0.000	0.044	0.000	...	0.032	0.047	0.040	0.000	0.000	0.043	0.043	0.000	0.000	2022-02-09 13:12:00
28475	0.065	0.025	0.039	0.044	0.000	0.905	0.257	0.000	0.033	0.000	...	0.032	0.047	0.049	0.000	0.000	0.047	0.047	0.000	0.000	2022-02-09 13:13:00
28476	0.065	0.081	0.050	0.048	0.000	0.908	0.258	0.000	0.026	0.001	...	0.032	0.047	0.040	0.000	0.000	0.043	0.043	0.000	0.000	2022-02-09 13:14:00
28477	0.065	0.056	0.048	0.046	0.000	0.903	0.257	0.000	0.033	0.000	...	0.032	0.042	0.043	0.000	0.000	0.040	0.040	0.000	0.000	2022-02-09 13:15:00
28478	0.011	0.030	0.041	0.044	0.000	0.892	0.256	0.000	0.031	0.002	...	0.034	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	2022-02-09 13:16:00