spikefilter.conf

This format of file defines sets of processes used to generate segments around detected spikes. The direct handler is data.segmentation.spikefilter which is called on different instances by data.editquality and data.edit.mentor.generate.spikefilter.

Format

Lines beginning with '#' are treated as comments. The format consists of a CSV separated list of key and values. Keys are case insensitive. The file is further broken down into a number of nested definitions, each scope defining part of the filter. The highest level, the “set” level, defines a complete set of filters.

Within each set there can be a number of independent filters, each of which gets a vote on if a specific point is part of a spike or not. These votes are combined with a voting function and checked against a threshold. If they exceed the threshold then the value can be “bleed” to neighboring points and the process repeated. The default is that each filter has a vote weight of one and no bleeding is done. That is, if any filter marks a point as a spike then the output point is considered as one.

Set Scope Definition

A set is begun with the key “BeginSet” and ends with the key “EndSet”. It can take up to four parameters: the start, the end, the tag name, and the tag records, in that order.

Start and End

Determines the times for which this set of filters is applied to any input. Either bound can be zero or blank to indicate unlimited time, otherwise they can be any convertible time format.

Tag Name

The “name” or “type” of any segments that this set of filters generates. See the output format. This can be used by a calling program to determine handling of the segments. For example, data.edit.mentor.generate recognizes “invalidate” and “contaminate” in this field for the two types of edits it can generate. In that example, it ignores the tag records and variables (below) for contaminate edits, but uses them for invalidate edits.

Tag Records

The output tagged records for this filter set. These are the records that follow the tag name in the output format. By default this will be any records used by the filter.

Set Scope

There are several direct keys that can be placed within a set scope as well as several other scopes that can be nested within it:

General Keys

These keys are valid in both the set global scope and specific filter scopes:

VoteBleeding Scope

All keys until “End” are part of the scope. The scope defines the bleeding function applied when any vote value exceeds the threshold. Bleeding allows for a large vote to cause other nearby ones to also trigger. The following keys are valid within the scope:

VoteFunction scope

All keys until “End” are part of the scope. This scope defines the transformation of an input vote value (usually proportional to the confidence of the filter) of a point to the value passed to the global voting handler. It can consist of the following keys:

VoteWeight Scope

All keys until “End” are part of the scope. This scope defines the weighting of a specific filter's vote output into the global vote value. That value is the sum of all the weighted inputs and is compared against the global threshold. It can consist of the following keys:

ResidualNoiseFunction Scope

All keys until “End” are part of the scope. This scope defines the transformation from a residual of some smoother to a single value representing the noise of the variable. The output is the sum of the values transformed by the parameters below, divided by the number of points used. The following keys are valid:

Limits Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that occurs whenever any of its input variables exceeds a minimum and/or maximum value. A limits failure causes other filters to not consider the point. The following keys are valid:

ResidualSPLP Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that is based on the residual from a single pole low pass digital filter, run forwards and backwards in time. The noise estimate is based off a median smoother. The following keys are valid:

ProbitSVM Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that is based on the residual from a SVM regression fit (epsilon regression of radial basis functions) of the variables. The trigger threshold is based of some portion of the residuals in the cumulative frequency space (in probit values). The following keys are valid: