Contributing a new Validation to Validoopsie
Creating a new validation might seem complex at first, but don’t worry—we’ve
made the process as straightforward as possible. Validations are dynamically
imported from the validoopsie/validation_catalogue
directory, and once you
create your first one, everything will start to make sense!
Follow these steps to create a new validation:
1. Organize Your Validation
In the validoopsie/validation_catalogue
directory, validations are organized
by category. For this example, we’ll create a validation to check whether
values in a column fall between two numbers.
Since this is a value-based validation, you’ll create the file inside the
ValuesValidation
folder.
2. Create the Validation File
- Create a new file in
validoopsie/validation_catalogue/ValuesValidation
. - Name the file using this convention:
File Name:
column_values_to_be_between.py
Class Name:ColumnValuesToBeBetween
3. Define the Validation Class
Your class should inherit from BaseValidation
.
Here’s an example:
from validoopsie.base import BaseValidation
# We will use these later
from typing import Literal
from narwhals.typing import Frame
class ColumnValuesToBeBetween(BaseValidation):
4. Add a Docstring
Include a clear docstring and doctest for both the class and its methods, these will also be reused for the stub file later on. Here’s an example structure:
"""Check if the values in a column are within a specific range.
Args:
column (str): Column to validate.
min_value (int): Minimum value.
max_value (int): Maximum value.
threshold (float, optional): Validation threshold. Defaults to 0.0.
impact (str, optional): Impact level of validation. Defaults to "low".
kwargs (dict): Additional keyword arguments.
Examples:
>>> import pandas as pd
>>> from validoopsie import Validate
>>>
>>> # Validate numeric range
>>> df = pd.DataFrame({
... "age": [25, 30, 42, 18, 65]
... })
>>>
>>> vd = (
... Validate(df)
... .ValuesValidation.ColumnValuesToBeBetween(
... column="age",
... min_value=18,
... max_value=65
... )
... )
>>> key = "ColumnValuesToBeBetween_age"
>>> vd.results[key]["result"]["status"]
'Success'
>>>
>>> # When calling validate on successful validation there is no error.
>>> vd.validate()
"""
Doctest servers as a double purpose, it provides examples on how to use the validation, automatically showcased in the documentation and also serves as a test case for the validation.
5. Implement the __init__
Method
The __init__
method must include column
(if your validation uses a
combination of columns or some other varation pass it later to
super().__init__
).
In this case we are attempting to find values between two numbers, hence, these
parameters are required: column
, min_value
, and max_value
.
It should also include the standard impact
and threshold
parameters
required by the base class (BaseValidation
).
- Always pass
column
,impact
, andthreshold
to the base class. - If your validation doesn’t require a column, generate one in the
__init__
method.
Example:
def __init__(
self,
column: str,
min_value: float,
max_value: float,
impact: Literal["low", "medium", "high"] = "low",
threshold: float = 0.00,
**kwargs: dict[str, object],
) -> None:
super().__init__(column, impact, threshold, **kwargs)
self.min_value = min_value
self.max_value = max_value
6. Add a Fail Message
The fail message is included in the output report if the validation fails. Define it using a property:
@property
def fail_message(self) -> str:
"""Return the fail message used in the report."""
return (
f"The column '{self.column}' has values that are not "
f"between {self.min_value} and {self.max_value}."
)
7. Define the Validation Logic (__call__
Method)
The __call__
method contains the core logic of your validation. It should
return only the failed values.
Example:
def __call__(self, frame: Frame) -> Frame:
"""Check if the values in a column are within the specified range.
The result will be used during execution.
"""
return (
frame.group_by(self.column)
.agg(nw.col(self.column).count().alias(f"{self.column}-count"))
.filter(
nw.col(self.column).is_between(self.min_value, self.max_value) == False,
)
)
8. Test Your Validation
Prior to proceeding with test generation, you might want to use your already generated test in the docstring, for that you can run:
uv run pytest validoopsie --doctest-modules
- Create a test file under
tests/test_validation_catalogue/test_ValuesValidation
. - Use the
@create_frame_fixture
decorator to define test data for different DataFrame libraries (pandas
,polars
,pyarrow
, etc.).
Example:
from tests.utils.create_frames import create_frame_fixture
@create_frame_fixture
def lf() -> dict[str, list]:
return {
"A": [1, 2, 3, 4, 5],
"B": [1.0, 2.0, 3.0, 4.0, 5.0],
}
- Write test cases for your validation:
def test_column_values_to_be_between(lf: Frame) -> None:
test = ColumnValuesToBeBetween("A", 1, 2)
result = test.__execute_check__(frame=lf)
assert result["result"]["status"] == "Fail"
9. Update the Stub File for Type Hinting
To enable proper type hinting for your new validation, update the stub file
(validate.pyi
). This step is essential because validations are dynamically
imported.
Add your validation like this:
class Validate:
class ValuesValidation:
@staticmethod
def ColumnValuesToBeBetween(
column: str,
min_value: int,
max_value: int,
threshold: float = 0.00,
impact: str = "low",
**kwargs: KwargsType,
) -> Validate:
"""Check if the values in a column are within a specific range.
Args:
column (str): Column to validate.
min_value (int): Minimum value.
max_value (int): Maximum value.
threshold (float, optional): Validation threshold. Defaults to 0.0.
impact (str, optional): Impact level of validation. Defaults to "low".
kwargs (dict): Additional keyword arguments.
Examples:
>>> import pandas as pd
>>> from validoopsie import Validate
>>>
>>> # Validate numeric range
>>> df = pd.DataFrame({
... "age": [25, 30, 42, 18, 65]
... })
>>>
>>> vd = (
... Validate(df)
... .ValuesValidation.ColumnValuesToBeBetween(
... column="age",
... min_value=18,
... max_value=65
... )
... )
>>> key = "ColumnValuesToBeBetween_age"
>>> vd.results[key]["result"]["status"]
'Success'
>>>
>>> # When calling validate on successful validation there is no error.
>>> vd.validate()
"""
Make sure to include a docstring here as well since they’re dynamically imported.
All Done!
Congratulations! You’ve just created your first validation. 🎉
If you have any questions or run into issues, feel free to reach out for help. Thank you for contributing!