Pandantic usage

Installation

First, install pandantic by using pip (or any other package managing tool).

pip install pandantic

Using the validator

The validator supports two modes:

  • errors="raise": Raises a ValueError if any row fails validation

  • errors="skip": Returns a new DataFrame with only the valid rows

Advanced features

Strict Type Validation

The validator supports Pydantic’s strict types for more rigorous validation:

from pydantic import BaseModel
from pydantic.types import StrictInt
from pandantic import Pandantic

class StrictSchema(BaseModel):
    example_str: str
    example_int: StrictInt  # Will only accept actual integers

validator = Pandantic(schema=StrictSchema)
df = pd.DataFrame({
    "example_str": ["foo", "bar"],
    "example_int": [1, "2"]  # Second value will fail as it's a string
})

# This will only keep the first row
df_valid = validator.validate(dataframe=df, errors="skip")

Custom Validators

You can still use all of Pydantic’s validation features in your schema:

from pydantic import BaseModel, field_validator
from pandantic import Pandantic

class CustomSchema(BaseModel):
    example_str: str
    example_int: int

    @field_validator("example_int")
    def must_be_even(cls, v: int) -> int:
        if v % 2 != 0:
            raise ValueError("Number must be even")
        return v

validator = Pandantic(schema=CustomSchema)

Optional Fields

As the DataFrame is being parsed into a dict, a None value is considered as a nan value in cases there are different values in the dict. Therefore, specifying Optional columns (where the value can be empty) can be specified by using the custom pandantic.Optional type. This type is a replacement for typing.Optional.

from pydantic import BaseModel
from pandantic import Optional  # pylint: disable=import-outside-toplevel

# GIVEN
class Model(BaseModel):
    a: Optional[int] = None
    b: int

df_example = pd.DataFrame({"a": [1, None, 2], "b": ["str", 2, 3]})

validator = Pandantic(schema=Model)