has_columns()

Check whether one or more columns exist in a table.

Usage

Source

has_columns(*columns)

This function returns a callable that, when given a table, checks whether all specified columns are present. It is primarily designed for use with the active= parameter of validation methods. When a validation step has active=has_columns("col_a", "col_b"), the step will be skipped (made inactive) if either col_a or col_b is missing from the target table.

The callable is evaluated against the original table before any pre= processing is applied. This means the column check is performed on the raw input data, not on a pre-processed version of it.

A note is attached to any skipped step in the validation report explaining which columns were not found.

Parameters

*columns: str | list[str]
One or more column names to check for in the table. Each argument can be a string or a list of strings. All specified columns must be present for the callable to return True.

Returns

Callable[[Any], bool]
A callable that accepts a table and returns True if every column in columns exists in the table, False otherwise.

Raises

ValueError

If no column names are provided.

TypeError
If any of the provided column names is not a string or list of strings.

Examples

Using has_columns() with the active= parameter to conditionally run a validation step:

import pointblank as pb
import polars as pl

tbl = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a"))
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("z"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00
#4CA64C66 2
col_vals_gt
col_vals_gt()
a 0

Notes

Step 2 (active_check) Step skipped — Column check failed: missing column(s) z.

The first step ran because column a exists. The second step was skipped because column z is missing, and the report note explains which column was not found.

When checking for multiple columns, the step is only active when all columns are present:

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a", "b"))
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a", "x", "y"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00
#4CA64C66 2
col_vals_gt
col_vals_gt()
a 0

Notes

Step 2 (active_check) Step skipped — Column check failed: missing column(s) x, y.

The first step is active because both a and b exist. The second step is skipped because x and y are missing.

Column names can also be provided as a list:

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns(["a", "b"]))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00