Threshold Levels in Validoopsie
Validoopsie uses threshold arguments to specify the acceptable failure rate before a validation fails. This allows for more flexible validation rules that can tolerate a certain percentage of exceptions.
Understanding Thresholds
Thresholds in Validoopsie represent the maximum allowable percentage of records that can fail validation:
Threshold | Description | Behavior |
---|---|---|
0.0 |
No failures allowed (default) | Validation fails if any records don't meet criteria |
0.1 |
10% failure tolerance | Validation passes if 90% or more records meet criteria |
0.5 |
50% failure tolerance | Validation passes if at least half of records meet criteria |
1.0 |
100% failure tolerance | Validation always passes regardless of failures |
How to Set Thresholds
Thresholds can be set on any validation:
from validoopsie import Validate
import pandas as pd
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"value": [10, 20, 30, 200, 40]
})
validator = Validate(df)
# No tolerance - fails if any values are outside range
validator.ValuesValidation.ColumnValuesToBeBetween(
column="value",
min_value=0,
max_value=100,
threshold=0.0 # This is the default if not specified
)
# 20% tolerance - passes if at least 80% of values are within range
validator.ValuesValidation.ColumnValuesToBeBetween(
column="value",
min_value=0,
max_value=100,
threshold=0.2
)
# Combine with impact levels for complete validation control
validator.StringValidation.PatternMatch(
column="email",
pattern=r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$",
threshold=0.05, # Allow 5% of emails to be invalid
impact="medium" # Record failures but don't interrupt processing
)
When to Use Thresholds
-
Zero Threshold (0.0): Use for strict validations where no exceptions are allowed
- Example: Primary key uniqueness
- Example: Required fields that must be present
-
Low Threshold (< 0.1): Use for validations that should generally pass but can tolerate rare exceptions
- Example: Email format validation that might have a few legacy exceptions
- Example: Date format validation with occasional special cases
-
Medium Threshold (0.1 - 0.5): Use when a significant portion of data might legitimately fail validation
- Example: Missing values in optional fields
- Example: Pattern matching for fields with multiple valid formats
-
High Threshold (> 0.5): Use rarely, typically for informational validations
- Example: Checking if most records follow a new standard that's being gradually implemented
- Example: Monitoring the adoption rate of new data formats
Default Threshold
If not specified, all validations default to a threshold of 0.0
(no failures allowed).