import pointblank as pb
game_revenue_polars = pb.load_dataset("game_revenue")
pb.get_row_count(game_revenue_polars)2000
Get the number of rows in a table.
Usage
The get_row_count() function returns the number of rows in a table. The function works with any table that is supported by the pointblank library, including Pandas, Polars, and Ibis backend tables (e.g., DuckDB, MySQL, PostgreSQL, SQLite, Parquet, etc.). It also supports direct input of CSV files, Parquet files, and database connection strings.
data: Any intThe data= parameter can be given any of the following table types:
"polars")"pandas")"pyspark")"duckdb")*"mysql")*"postgresql")*"sqlite")*"mssql")*"snowflake")*"databricks")*"bigquery")*"parquet")*pathlib.Path object with .csv extension)pathlib.Path object, glob pattern, directory with .parquet extension, or partitioned dataset)The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using get_row_count() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.
To use a CSV file, ensure that a string or pathlib.Path object with a .csv extension is provided. The file will be automatically detected and loaded using the best available DataFrame library. The loading preference is Polars first, then Pandas as a fallback.
GitHub URLs pointing to CSV or Parquet files are automatically detected and converted to raw content URLs for downloading. The URL format should be: https://github.com/user/repo/blob/branch/path/file.csv or https://github.com/user/repo/blob/branch/path/file.parquet
Connection strings follow database URL formats and must also specify a table using the ::table_name suffix. Examples include:
"duckdb:///path/to/database.ddb::table_name"
"sqlite:///path/to/database.db::table_name"
"postgresql://user:password@localhost:5432/database::table_name"
"mysql://user:password@localhost:3306/database::table_name"
"bigquery://project/dataset::table_name"
"snowflake://user:password@account/database/schema::table_name"
When using connection strings, the Ibis library with the appropriate backend driver is required.
Getting the number of rows in a table is easily done by using the get_row_count() function. Here’s an example using the game_revenue dataset (itself loaded using the load_dataset() function):
import pointblank as pb
game_revenue_polars = pb.load_dataset("game_revenue")
pb.get_row_count(game_revenue_polars)2000
This table is a Polars DataFrame, but the get_row_count() function works with any table supported by pointblank, including Pandas DataFrames and Ibis backend tables. Here’s an example using a DuckDB table handled by Ibis:
game_revenue_duckdb = pb.load_dataset("game_revenue", tbl_type="duckdb")
pb.get_row_count(game_revenue_duckdb)2000
The get_row_count() function can directly accept CSV file paths:
The function supports various Parquet input formats:
# Single Parquet file from package data
parquet_path = pb.get_data_path("nycflights", "parquet")
pb.get_row_count(parquet_path)336776
You can also use glob patterns and directories:
The function supports database connection strings for direct access to database tables:
# Get path to a DuckDB database file from package data
duckdb_path = pb.get_data_path("game_revenue", "duckdb")
pb.get_row_count(f"duckdb:///{duckdb_path}::game_revenue")2000
The function always returns the number of rows in the table as an integer value, which is 2000 for the game_revenue dataset.