Welcome to Pandantic’s documentation!
Gone are the days of black-box dataframes in otherwise type-safe code!
Pandantic builds off the Pydantic API to enable validation and filtering of the usual dataframe types (i.e., pandas, etc.)
Installation
Install Pandantic using pip:
$ pip install pandantic
Quick Start
Here’s a simple example demonstrating how to validate a pandas DataFrame:
import pandas as pd
from pydantic import BaseModel
from pandantic import Pandantic
# Define your schema using Pydantic
class EmployeeSchema(BaseModel):
name: str
salary: int
department: str
# Create sample DataFrame with mixed valid/invalid data
df = pd.DataFrame({
"name": ["Alice Smith", "Bob Jones", 123], # Last row: invalid name
"salary": [50000, "high", 60000], # Second row: invalid salary
"department": ["Engineering", "Sales", "Marketing"]
})
# Initialize validator
validator = Pandantic(schema=EmployeeSchema)
# Method 1: Skip invalid rows
df_valid = validator.validate(dataframe=df, errors="skip")
print(f"Valid rows: {len(df_valid)} out of {len(df)}")
# Method 2: Raise error on invalid data
try:
validator.validate(dataframe=df, errors="raise")
except ValueError as e:
print(f"Validation error: {e}")
Key Features
Validate DataFrame columns against Pydantic models
Two validation modes: skip invalid rows or raise errors
Full compatibility with Pydantic’s type system and validators
Simple, intuitive API following pandas conventions