In PostgreSQL, a table is a structured collection of rows and columns where data is stored.
You define a table using the CREATE TABLE command, specifying the column names, data types, and optional constraints (like PRIMARY KEY, NOT NULL, or UNIQUE).
Tables belong to a schema within a database. By default, they are created inside the public schema unless another one is specified.
column_name: the name of each column in that table.
data_type: defines the kind of data stored (e.g., INTEGER, TEXT, BOOLEAN, DATE, etc.).
[constraint]: optional restrictions like NOT NULL, DEFAULT, etc.
[table_constraints]: rules applying to multiple columns, e.g. primary keys, foreign keys.
Example: Creating a Simple Table
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
hire_date DATE NOT NULL DEFAULT CURRENT_DATE,
active BOOLEAN DEFAULT TRUE
);
SERIAL automatically creates an auto-incrementing integer column (in modern PostgreSQL, use GENERATED ALWAYS AS IDENTITY instead).
PRIMARY KEY ensures each row has a unique, non-null identifier.
VARCHAR(50) defines a text field up to 50 characters.
DEFAULT CURRENT_DATE auto-fills with today's date when no value is given.
Modern Auto-Increment Syntax (Preferred)
CREATE TABLE employees (
employee_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
department TEXT DEFAULT 'General'
);
GENERATED ALWAYS AS IDENTITY is the SQL-standard version of SERIAL.
You can also use GENERATED BY DEFAULT AS IDENTITY if you want to allow manual insertion of custom IDs.
Adding Table-Level Constraints
CREATE TABLE orders (
order_id INT GENERATED ALWAYS AS IDENTITY,
customer_id INT NOT NULL,
order_total NUMERIC(10,2) NOT NULL CHECK (order_total >= 0),
order_date TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (order_id),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CHECK (order_total >= 0) ensures totals can't be negative.
PRIMARY KEY and FOREIGN KEY constraints can appear at the bottom as table-level definitions.
NUMERIC(10,2) means up to 10 digits total, 2 after the decimal point.
Common Data Types
Type
Example
Description
INTEGER, BIGINT, SMALLINT
42
Whole numbers, various ranges.
NUMERIC(p, s)
1234.56
Fixed-point precision numbers (money, totals).
TEXT, VARCHAR(n), CHAR(n)
'Hello'
Variable or fixed-length strings.
DATE, TIME, TIMESTAMP
'2025-01-01'
Date and time storage.
BOOLEAN
TRUE / FALSE
Logical values.
JSON, JSONB
'{"key": "value"}'
Structured JSON data (JSONB is binary-optimized).
Creating Tables in a Specific Schema
CREATE SCHEMA company;
CREATE TABLE company.departments (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
By default, PostgreSQL creates tables under the public schema.
Use CREATE SCHEMA to organize tables logically (e.g., per project or domain).
You can qualify table names like schema_name.table_name.
Viewing Tables
-- List all tables in current database
\dt
-- List tables in a specific schema
\dt company.*
-- Show table structure
\d employees
These commands are used inside the psql interactive terminal.
\dt lists tables, \d <table> describes structure, columns, and constraints.
Altering or Dropping Tables
ALTER TABLE employees ADD COLUMN email TEXT;
ALTER TABLE employees DROP COLUMN active;
DROP TABLE employees;
ALTER TABLE allows adding/removing columns or modifying constraints.
DROP TABLE permanently deletes the table (and all its data).
Understanding Schemas in PostgreSQL
What is a Schema?
A schema in PostgreSQL is like a folder or namespace inside a database.
Each schema contains its own set of tables, views, functions, and other objects — just like different folders can contain different files.
This allows you to:
Organize your database objects logically (e.g. hr.employees, sales.orders).
Have tables with the same name in different schemas without conflict (e.g. test.users vs prod.users).
Control access permissions at a schema level.
Database vs Schema
PostgreSQL databases are the top-level containers.
A schema lives inside a database, and tables live inside a schema.
Database → Schema → Table → Row
For example, you might have a database called company_db, which contains two schemas:
This way, you can separate your operational data (public) from analytical data (analytics).
The Default Schema: public
When you first create a new database, PostgreSQL automatically creates a schema named public.
Unless you specify otherwise, every table, view, or function you create goes into the public schema.
-- This table is created inside the default "public" schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
Its full name is actually public.users, even if you normally omit the schema prefix.
Creating a Custom Schema
CREATE SCHEMA hr;
CREATE SCHEMA sales;
Now you have two additional namespaces: hr and sales.
You can create tables inside them using a schema-qualified name:
CREATE TABLE hr.employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
position TEXT
);
CREATE TABLE sales.orders (
id SERIAL PRIMARY KEY,
amount NUMERIC(10,2),
customer TEXT
);
Now hr.employees and sales.orders are two independent tables, even though both live in the same database.
Accessing Tables in Schemas
-- Access table using schema-qualified name
SELECT * FROM hr.employees;
-- Access table from the default "public" schema
SELECT * FROM public.users;
You always access tables by schema_name.table_name.
If you omit the schema, PostgreSQL looks for the table in the schemas listed in the current search_path (see below).
The Search Path
The search_path is the list of schemas that PostgreSQL looks through when you reference an object without a schema prefix.
By default, it is usually:
SHOW search_path;
-- Output example:
-- "public", "$user"
This means:
First, PostgreSQL looks for the object in public.
Then, it looks for a schema with the same name as the current user.
You can change the search path to prioritize another schema:
SET search_path TO hr, public;
-- Now "employees" means "hr.employees"
SELECT * FROM employees;
When working on large projects, this helps you avoid writing hr. or sales. all the time.
Dropping a Schema
DROP SCHEMA hr;
-- Or remove all its objects automatically:
DROP SCHEMA hr CASCADE;
DROP SCHEMA hr; will fail if the schema is not empty.
CASCADE deletes the schema and all contained tables, views, and functions.
Use CASCADE with caution — this is irreversible!
Schema Permissions
You can grant or revoke access to schemas independently of table permissions.
-- Grant a user permission to create objects in a schema
GRANT CREATE ON SCHEMA sales TO alice;
-- Allow a user to use (read from) the schema
GRANT USAGE ON SCHEMA hr TO bob;
USAGE allows access to existing objects inside the schema.
CREATE allows creating new objects within that schema.
Combining these allows a user to both read and create inside that namespace.
When to Use Schemas
Organizing modules: for example, hr, finance, and sales schemas in a company database.
Separating environments: e.g., dev, test, prod schemas in the same database for testing purposes.
Multi-tenant applications: one schema per customer, while keeping all data in a single database.
Access control: different roles can have access to different schemas.
Summary
A schema is a namespace inside a PostgreSQL database that groups related tables and objects together.
By default, all objects go into the public schema, but you can create your own with CREATE SCHEMA.
Access tables as schema_name.table_name, or adjust search_path for convenience.
Use schemas to organize projects, control permissions, and keep your database clean and modular.
Inserting Data into Tables
Introduction
Once you've created a table, you can add rows of data using the INSERT INTO statement.
Each inserted row corresponds to one record in the table, containing values for one or more columns.
The RETURNING clause immediately outputs data from the newly inserted row(s).
You can return multiple columns:
RETURNING employee_id, first_name, hire_date;
Handling Conflicts (Upserts)
INSERT INTO employees (employee_id, first_name, last_name, department)
VALUES (1, 'Ian', 'Miller', 'Engineering')
ON CONFLICT (employee_id) DO UPDATE SET department = EXCLUDED.department;
ON CONFLICT handles cases where an insert violates a unique or primary key constraint.
EXCLUDED refers to the new row that caused the conflict.
You can:
DO NOTHING to skip the row, or
DO UPDATE to modify the existing row.
Example: Full Insert Scenario
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO customers (name, email)
VALUES
('Alice Green', 'alice@example.com'),
('Bob White', 'bob@example.com');
INSERT INTO customers (name, email)
VALUES ('Alice Green', 'alice@example.com')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;
The first insert adds two new customers.
The second one updates an existing customer (instead of erroring) because of the ON CONFLICT rule.
NOW() automatically sets the timestamp.
Basics on Selecting Data from Tables
Introduction
The SELECT statement in PostgreSQL is used to query data from one or more tables.
PostgreSQL processes SELECT queries logically in this order:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
This order matters when you design complex queries.
Basic Syntax
SELECT <column1>, <column2>, ...
FROM <table_name>;
You can select all columns using *, but it's better to specify only what you need.
Example: Select All Columns
SELECT * FROM employees;
Retrieves every column and every row from employees.
Useful for quick checks, but inefficient in production queries.
Filtering Rows with WHERE
SELECT first_name, last_name
FROM employees
WHERE department = 'Engineering';
WHERE filters rows based on a condition.
Operators include:
= equal
<> or != not equal
>, <, >=, <=
AND, OR, NOT for logical combinations
Filtering with Pattern Matching (LIKE)
SELECT first_name, last_name
FROM employees
WHERE last_name LIKE 'S%';
LIKE is used for pattern matching with wildcards:
%: matches any sequence of characters.
_: matches exactly one character.
Example: 'S%' finds all last names starting with "S".
Sorting Results with ORDER BY
SELECT first_name, last_name, hire_date
FROM employees
ORDER BY hire_date DESC;
ORDER BY sorts rows by one or more columns.
Use ASC (default) or DESC for descending order.
You can sort by multiple columns:
ORDER BY department ASC, last_name ASC;
Limiting Results with LIMIT and OFFSET
SELECT * FROM employees
ORDER BY employee_id
LIMIT 5 OFFSET 10;
LIMIT restricts how many rows are returned.
OFFSET skips the given number of rows before starting output.
Commonly used for pagination (e.g., "page 3, 10 results per page").
Aliasing Columns and Tables
SELECT
first_name AS given,
last_name AS family
FROM employees AS e;
Aliases give temporary names to columns or tables for readability.
You can omit AS, both forms work:
SELECT first_name given, last_name family FROM employees e;
Combining Conditions
SELECT first_name, last_name, department
FROM employees
WHERE department = 'Sales'
AND hire_date >= '2024-01-01'
OR active = TRUE;
Use parentheses to control logical order:
WHERE (department = 'Sales' AND hire_date >= '2024-01-01') OR active = TRUE;
Distinct Rows
SELECT DISTINCT department
FROM employees;
DISTINCT removes duplicate rows from the result set.
You can apply it to one or multiple columns:
SELECT DISTINCT department, active FROM employees;
Computed Columns and Expressions
SELECT
first_name || ' ' || last_name AS full_name,
EXTRACT(YEAR FROM hire_date) AS hire_year
FROM employees;
You can create new columns by combining or transforming existing ones.
|| concatenates strings.
EXTRACT() extracts a specific part (like year, month, day) from a date or timestamp.
Aggregating Data with GROUP BY
SELECT
department,
COUNT(*) AS total_employees
FROM employees
GROUP BY department;
GROUP BY groups rows by unique combinations of column values.
Often used with aggregate functions:
COUNT(): number of rows.
SUM(): total sum.
AVG(): average.
MIN() / MAX(): smallest / largest value.
Filtering Groups with HAVING
SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
HAVING filters groups (after aggregation), while WHERE filters individual rows (before aggregation).
Example: only show departments with more than 5 employees.
Combining Results with UNION
SELECT first_name, last_name FROM employees
UNION
SELECT first_name, last_name FROM contractors;
UNION merges two result sets and removes duplicates.
Use UNION ALL to keep duplicates.
Each SELECT must have the same number and type of columns.
Concatenating Strings in PostgreSQL
Introduction
String concatenation means joining two or more strings together into a single text value.
In PostgreSQL, you can concatenate strings using:
The concatenation operator || (most common and SQL-standard).
The CONCAT() function.
The CONCAT_WS() function (concatenate with a separator).
Using the || Operator
SELECT 'Hello' || ' ' || 'World' AS greeting;
The || operator joins strings directly.
In this example, PostgreSQL returns Hello World.
You can concatenate columns and literals together as well:
SELECT
first_name || ' ' || last_name AS full_name
FROM employees;
This is the most widely used approach for text combination in PostgreSQL.
Handling NULLs with ||
SELECT 'Hello' || NULL || 'World' AS result;
If any operand is NULL, the entire concatenated result becomes NULL.
Example result: NULL.
To avoid this, use the COALESCE() function to replace NULL with an empty string:
SELECT 'Hello' || COALESCE(NULL, '') || 'World' AS safe_concat;
Result: HelloWorld.
Using CONCAT()
SELECT CONCAT('Hello', ' ', 'World') AS greeting;
CONCAT() joins any number of arguments together.
Unlike ||, it automatically treats NULL as an empty string.
Example:
SELECT CONCAT('Hello', NULL, 'World') AS greeting;
Result: HelloWorld (not NULL).
This makes CONCAT() safer when dealing with nullable columns.
Using CONCAT_WS() — Concatenate with Separator
SELECT CONCAT_WS('-', '2025', '11', '10') AS formatted_date;
CONCAT_WS(separator, arg1, arg2, ...) stands for "Concatenate With Separator."
The first argument is used as the separator between all other arguments.
NULL values are ignored automatically.
Example with NULL:
SELECT CONCAT_WS(' ', 'Alice', NULL, 'Johnson');
Result: Alice Johnson.
Concatenating Columns Example
SELECT
first_name || ' ' || last_name AS full_name,
department || ' Department' AS dept_name
FROM employees;
You can use concatenation when updating data, for example, to fill a derived column.
PostgreSQL evaluates the expressions on the right-hand side for each row.
Concatenation with Numbers and Other Types
SELECT
'Employee ID: ' || employee_id AS label
FROM employees;
PostgreSQL automatically casts numbers, booleans, and dates to text when concatenated with a string.
You can also use explicit conversion for clarity:
SELECT 'Employee ID: ' || CAST(employee_id AS TEXT) FROM employees;
Formatting Example: Full Address
SELECT
CONCAT_WS(', ',
street,
city,
postal_code,
country
) AS full_address
FROM customers;
This builds a clean formatted address string, skipping NULL parts automatically.
Example result: "Main Street 5, Berlin, 10115, Germany".
Dropping Tables in PostgreSQL
Introduction
The DROP TABLE statement permanently removes a table and all its data from a PostgreSQL database.
When you drop a table, PostgreSQL deletes:
All rows stored in the table.
Associated indexes and constraints.
Any dependent objects (if you use CASCADE).
Once dropped, the data cannot be recovered unless you have a backup.
Basic Syntax
DROP TABLE <table_name>;
table_name — the name of the table you want to delete.
If the table doesn't exist, PostgreSQL will raise an error unless you use DROP TABLE IF EXISTS.
Example: Dropping a Table
DROP TABLE employees;
Deletes the employees table completely from the database.
All its data, indexes, and constraints are removed permanently.
Safely Dropping a Table
DROP TABLE IF EXISTS employees;
Prevents an error if the table does not exist.
This is a safer way to clean up tables, especially in scripts or migrations.
Dropping Multiple Tables
DROP TABLE IF EXISTS employees, departments, projects;
You can drop multiple tables in one command by separating their names with commas.
Each table will be dropped in the order listed.
Using CASCADE and RESTRICT
DROP TABLE employees CASCADE;
CASCADE automatically drops any dependent objects (like foreign keys in other tables, views, or triggers).
Use carefully — this can remove more than you expect!
To prevent accidental cascade deletion, use RESTRICT (the default behavior):
DROP TABLE employees RESTRICT;
PostgreSQL will refuse to drop the table if anything depends on it.
Example: Dropping with Dependencies
-- Example: "customers" table references "orders" through a foreign key
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY
);
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(order_id)
);
-- Dropping the parent table with CASCADE
DROP TABLE orders CASCADE;
CASCADE tells PostgreSQL to remove not only the orders table itself, but also any objects that depend on it — such as foreign key constraints, views, or triggers.
In this example:
The table orders is permanently deleted.
The foreign key constraint (inside customers) is automatically removed, so customers.order_id can take whatever values it wants in the future.
The customers table itself remains intact — only the relationship to orders is dropped.
PostgreSQL will also print a notice similar to:
NOTICE: drop cascades to constraint customers_order_id_fkey on table customers
DROP TABLE
Always double-check before using CASCADE, as it can remove multiple dependent objects automatically.
Dropping a Table in a Specific Schema
DROP TABLE company.employees;
If the table belongs to a non-default schema, include the schema name.
Example: company.employees drops the employees table inside the company schema.
Checking if a Table Exists
-- Using psql meta-command
\dt
-- Or query the catalog directly
SELECT tablename
FROM pg_catalog.pg_tables
WHERE tablename = 'employees';
\dt lists all tables in the current database (works in psql).
Alternatively, query the pg_catalog.pg_tables system table to check existence programmatically.
Practical Tip: Use Transactions
BEGIN;
DROP TABLE employees;
ROLLBACK;
During testing, wrap DROP TABLE inside a transaction.
ROLLBACK cancels the deletion if you change your mind before committing.
This helps prevent irreversible data loss while experimenting.
Filtering Rows with the WHERE Clause
Introduction
The WHERE clause in PostgreSQL filters rows returned by a SELECT, UPDATE, or DELETE statement based on one or more conditions.
It is evaluated before grouping or ordering results — only rows satisfying the condition(s) are kept.
You can use logical operators (AND, OR, NOT), comparison operators (=, <, >, etc.), and range or set operators like BETWEEN, IN, and LIKE.
Basic Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
condition can include comparisons, logical connectors, or functions that evaluate to TRUE or FALSE.
Rows for which the condition is TRUE are returned; FALSE or NULL rows are excluded.
Comparison Operators
SELECT * FROM employees
WHERE department = 'Engineering';
Common operators:
Operator
Description
Example
=
Equal to
salary = 50000
<> or !=
Not equal to
department <> 'IT'
>
Greater than
age > 30
<
Less than
salary < 40000
>=
Greater than or equal to
age >= 18
<=
Less than or equal to
hire_date <= '2025-01-01'
Combining Conditions with AND, OR, NOT
SELECT * FROM employees
WHERE department = 'Sales'
AND hire_date >= '2024-01-01'
OR active = TRUE;
It is recommended to use parentheses for clarity:
WHERE (department = 'Sales' AND hire_date >= '2024-01-01') OR active = TRUE;
Filtering Ranges with BETWEEN
SELECT * FROM employees
WHERE salary BETWEEN 40000 AND 60000;
BETWEEN a AND b is equivalent to (salary >= a AND salary <= b).
It includes both boundary values.
You can use it with dates, numbers, or text:
WHERE hire_date BETWEEN '2024-01-01' AND '2024-12-31';
Filtering by a Set with IN
SELECT * FROM employees
WHERE department IN ('HR', 'IT', 'Finance');
IN checks whether a value matches any value in the provided list.
Equivalent to multiple OR conditions:
WHERE department = 'HR' OR department = 'IT' OR department = 'Finance';
You can also use a subquery:
WHERE department_id IN (SELECT id FROM departments WHERE active = TRUE);
Pattern Matching with LIKE and ILIKE
SELECT * FROM employees
WHERE last_name LIKE 'S%';
LIKE is case-sensitive pattern matching.
ILIKE is the case-insensitive version (PostgreSQL extension).
Examples:
WHERE name LIKE 'A%'; -- starts with A
WHERE name LIKE '%son'; -- ends with son
WHERE name ILIKE '%SMITH%'; -- contains 'smith' (case-insensitive)
Filtering NULL Values
SELECT * FROM employees
WHERE department IS NULL;
SELECT * FROM employees
WHERE department IS NOT NULL;
= NULL does not work — use IS NULL and IS NOT NULL.
This is because NULL represents "unknown", not an actual value.
Checking Membership with EXISTS
SELECT *
FROM employees e
WHERE EXISTS (
SELECT 1 FROM departments d
WHERE d.id = e.department_id
);
EXISTS checks whether a subquery returns at least one row.
If it does, the condition is TRUE, otherwise FALSE.
Often used for correlated subqueries and conditional joins.
Using ANY and ALL
SELECT * FROM employees
WHERE salary > ANY (SELECT salary FROM interns);
SELECT * FROM employees
WHERE salary > ALL (SELECT salary FROM interns);
ANY — condition is true if it holds for at least one value returned by the subquery.
ALL — condition must hold for every value returned by the subquery.
These are useful when comparing against ranges of values dynamically.
Combining Multiple Conditions
SELECT * FROM employees
WHERE department = 'Finance'
AND (salary BETWEEN 40000 AND 60000)
AND active = TRUE
AND last_name ILIKE 'J%';
Conditions can be chained together in flexible and readable ways.
Parentheses make the logic explicit and prevent ambiguity.
Database Design in PostgreSQL
Introduction
Database design is the process of organizing data into structured tables that represent real-world entities and their relationships.
Good design ensures:
Data integrity (accuracy and consistency).
Scalability (efficient growth with more data).
Performance (fast queries and updates).
Maintainability (easy schema evolution and clear relationships).
PostgreSQL, as a relational database, uses tables, keys, and relationships to represent structured data logically and efficiently.
Step 1: Identify Entities and Attributes
Begin by identifying entities — objects or concepts you need to store data about (e.g. customers, orders, products).
For each entity, determine its attributes — descriptive properties or fields.
Use the PRIMARY KEY constraint on a column to define it as the table's unique identifier.
PostgreSQL automatically makes that column NOT NULL and UNIQUE.
Example: Simple Primary Key
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
);
SERIAL automatically creates an integer column with an internal sequence generator.
The primary key constraint guarantees that every employee_id value is unique and non-null.
Primary keys are often numeric because they are efficient for indexing and joining tables.
Using GENERATED AS IDENTITY (Modern SQL)
CREATE TABLE departments (
dept_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT NOT NULL
);
GENERATED ALWAYS AS IDENTITY is the SQL-standard replacement for SERIAL.
PostgreSQL manages the sequence automatically and ensures strict consistency.
You can also use GENERATED BY DEFAULT AS IDENTITY to allow manual insertion of custom IDs if necessary.
Composite (Multi-Column) Primary Keys
CREATE TABLE enrollments (
student_id INT NOT NULL,
course_id INT NOT NULL,
enrollment_date DATE DEFAULT CURRENT_DATE,
PRIMARY KEY (student_id, course_id)
);
Composite keys use multiple columns to uniquely identify a record.
Here, the combination of student_id and course_id must be unique — but each one individually may repeat.
Common in many-to-many relationship tables (also called junction tables).
Adding a Primary Key to an Existing Table
ALTER TABLE employees
ADD CONSTRAINT employees_pkey PRIMARY KEY (employee_id);
If the table was created without a primary key, you can add one later using ALTER TABLE.
The constraint name (here employees_pkey) can be chosen manually or left for PostgreSQL to assign automatically.
Dropping a Primary Key
ALTER TABLE employees
DROP CONSTRAINT employees_pkey;
You can remove a primary key if you plan to redefine it or change your table structure.
However, be cautious — dropping a primary key may also affect FOREIGN KEY relationships referencing it.
Primary Key vs. Unique Constraint
Both ensure uniqueness, but they serve different purposes:
Feature
PRIMARY KEY
UNIQUE
NULL allowed?
No
Yes
Number per table
One only
Many allowed
Automatically indexed?
Yes
Yes
Purpose
Row identity
Prevent duplicate values
Viewing Primary Keys
-- Inside psql
\d employees
-- Query system catalog
SELECT
tc.table_name,
kc.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kc
ON kc.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY';
\d table_name shows primary keys directly in the psql terminal.
You can also query the system catalog for programmatic inspection of constraints.
Behavior with Foreign Keys
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id)
);
Primary keys are the natural targets of FOREIGN KEY relationships.
They guarantee that every foreign key in another table refers to a unique, existing row.
If you delete a primary key row, PostgreSQL will enforce the foreign key's action (CASCADE, SET NULL, etc.).
Choosing a Good Primary Key
Follow these best practices:
Keep it simple — use one column if possible.
Use numeric or UUID types for efficiency.
Never change primary key values after creation.
Choose a value that is unique and stable across the table's lifetime.
Examples of good candidates:
Auto-incrementing integers for internal identifiers.
UUIDs for distributed or external systems.
Composite keys for many-to-many link tables.
The NOT NULL Constraint in PostgreSQL
Introduction
The NOT NULL constraint ensures that a column cannot contain a NULL value.
In PostgreSQL, NULL represents "unknown" or "missing" data — not the same as an empty string or zero.
By declaring a column as NOT NULL, you guarantee that every row must have a valid (non-NULL) value in that column.
Basic Syntax
CREATE TABLE table_name (
column_name data_type NOT NULL,
...
);
Attach NOT NULL right after the data type in the column definition.
If you try to insert a row without providing a value for that column, PostgreSQL will raise an error.
Example: Defining NOT NULL Columns
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT,
hire_date DATE NOT NULL DEFAULT CURRENT_DATE
);
Here, first_name, last_name, and hire_date must always contain valid values.
email is optional — it can be left NULL.
If you omit a NOT NULL column in an INSERT statement, PostgreSQL will throw an error unless a DEFAULT value is provided.
Attempting to Insert NULL
INSERT INTO employees (first_name, last_name, hire_date)
VALUES (NULL, 'Smith', '2025-01-01');
This will fail with an error like:
ERROR: null value in column "first_name" of relation "employees" violates not-null constraint
DETAIL: Failing row contains (null, Smith, 2025-01-01).
PostgreSQL prevents the insertion because first_name is defined as NOT NULL.
Adding a NOT NULL Constraint to an Existing Table
ALTER TABLE employees
ALTER COLUMN email SET NOT NULL;
You can add NOT NULL later using ALTER TABLE, but only if all existing rows already have non-NULL values in that column.
If some rows contain NULL, you must first update or delete them before applying the constraint.
Removing a NOT NULL Constraint
ALTER TABLE employees
ALTER COLUMN email DROP NOT NULL;
This makes the column nullable again, allowing future NULL values.
Dropping NOT NULL is safe and does not affect existing data.
Using DEFAULT with NOT NULL
CREATE TABLE tasks (
task_id SERIAL PRIMARY KEY,
description TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
);
Combining NOT NULL with a DEFAULT value ensures the column is always filled — even if the user doesn't explicitly provide a value.
Example insert:
INSERT INTO tasks (description) VALUES ('Review code');
-- status automatically becomes 'pending'
Checking Which Columns Are NOT NULL
SELECT
column_name,
is_nullable
FROM information_schema.columns
WHERE table_name = 'employees';
The is_nullable column will show NO for NOT NULL columns and YES for nullable ones.
Useful for verifying schema constraints in large databases.
NOT NULL vs. CHECK Constraints
You could technically write a CHECK (column IS NOT NULL) constraint, but NOT NULL is faster, simpler, and the standard method.
Both enforce the same rule, but NOT NULL is built into the column definition and optimized by PostgreSQL.
Behavior with Primary Keys
Primary key columns are automatically NOT NULL, you don't need to specify it explicitly.
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
Even if you omit NOT NULL, the PRIMARY KEY constraint will enforce it internally.
The DEFAULT Constraint in PostgreSQL
Introduction
The DEFAULT constraint in PostgreSQL assigns an automatic value to a column when no value is specified during an INSERT operation.
It helps maintain data consistency and prevents NULL values in cases where a sensible fallback value exists.
Defaults can be constant values (like numbers or text), expressions, or even built-in PostgreSQL functions such as CURRENT_DATE or NOW().
The DEFAULT keyword is placed after the data type when defining the column.
If you omit the column in an INSERT statement, PostgreSQL automatically fills it with the default value.
Example: Simple DEFAULT Values
CREATE TABLE tasks (
task_id SERIAL PRIMARY KEY,
description TEXT NOT NULL,
status TEXT DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
When a new task is inserted without specifying status or created_at, PostgreSQL automatically assigns:
'pending' to status.
The current date and time (via NOW()) to created_at.
INSERT INTO tasks (description) VALUES ('Write documentation');
-- status → 'pending'
-- created_at → current timestamp
Using DEFAULT with Numeric and Boolean Columns
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) DEFAULT 0.00,
available BOOLEAN DEFAULT TRUE
);
If you omit price or available when inserting, PostgreSQL will assign 0.00 and TRUE respectively.
This avoids the need to manually fill common defaults like 0, TRUE, or 'N/A' for each row.
Using DEFAULT with Expressions
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE DEFAULT CURRENT_DATE,
total NUMERIC(10,2) DEFAULT (0.0),
order_ref TEXT DEFAULT ('ORD-' || EXTRACT(YEAR FROM CURRENT_DATE))
);
Defaults can be expressions or function results — not just constants.
Here:
CURRENT_DATE fills in today's date automatically.
('ORD-' || EXTRACT(YEAR FROM CURRENT_DATE)) generates a dynamic prefix for each order (e.g., ORD-2025).
Inserting Explicit DEFAULT Values
INSERT INTO tasks (description, status)
VALUES ('Review code', DEFAULT);
Use the DEFAULT keyword explicitly in INSERT statements to apply the column's predefined default value.
This is especially useful when inserting into selected columns but still wanting defaults for others.
Adding a DEFAULT Constraint to an Existing Table
ALTER TABLE employees
ALTER COLUMN active SET DEFAULT TRUE;
You can add or modify default values using ALTER TABLE.
You can apply CHECK directly to a single column or define it as a table-level constraint involving multiple columns.
The condition must evaluate to TRUE or NULL (not FALSE), otherwise, the insert or update will fail.
Example: Simple CHECK Constraint
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
salary NUMERIC(10,2) CHECK (salary >= 0)
);
This constraint guarantees that salary values must always be greater than or equal to zero.
Attempting to insert a negative salary will trigger an error.
ERROR: new row for relation "employees" violates check constraint "employees_salary_check"
DETAIL: Failing row contains (-500).
Multiple CHECK Constraints
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10,2) CHECK (price >= 0),
discount NUMERIC(5,2) CHECK (discount BETWEEN 0 AND 100)
);
Each CHECK is validated independently.
PostgreSQL allows multiple CHECK constraints on the same table, each with its own rule.
This constraint compares two columns in the same row.
It ensures that the balance never exceeds the credit_limit.
Unlike column-level checks, this form enforces a condition across multiple columns.
Adding a CHECK Constraint to an Existing Table
ALTER TABLE employees
ADD CONSTRAINT salary_positive CHECK (salary >= 0);
You can add new named CHECK constraints later with ALTER TABLE.
PostgreSQL immediately validates all existing rows — if any violate the rule, the command fails.
Dropping a CHECK Constraint
ALTER TABLE employees
DROP CONSTRAINT salary_positive;
To remove a CHECK constraint, you must reference it by name.
Use \d table_name in psql or query pg_constraint to see its name.
Using CHECK with Logical Operators
CREATE TABLE students (
student_id SERIAL PRIMARY KEY,
age INT CHECK (age >= 5 AND age <= 100),
grade TEXT CHECK (grade IN ('A', 'B', 'C', 'D', 'F'))
);
AND, OR, and IN can be combined inside the condition for complex logic.
This example:
Ensures age stays within a reasonable range.
Limits grade values to a predefined set of letters.
Using Functions in CHECK Constraints
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
email TEXT CHECK (POSITION('@' IN email) > 1)
);
You can call PostgreSQL functions or expressions within CHECK.
Here, POSITION('@' IN email) ensures that the email column contains an "@" symbol.
However, for complex validations, application-level or trigger-based logic may be preferable.
A DEFERRABLE constraint can be checked at the end of a transaction instead of at each statement.
This is useful when temporary violations occur during intermediate steps of a multi-step update.
Viewing CHECK Constraints
SELECT
conname AS constraint_name,
convalidated,
condeferrable
FROM pg_constraint
WHERE contype = 'c' AND conrelid = 'employees'::regclass;
Use the pg_constraint system catalog to inspect existing CHECK constraints.
convalidated = true indicates that all rows currently satisfy the constraint.
CHECK vs. Application-Level Validation
CHECK constraints are enforced at the database level — they cannot be bypassed by client errors or inconsistent application code.
However:
They should remain simple and deterministic.
Complex, cross-table validations (e.g., involving subqueries) are better handled with triggers or application logic.
Foreign Keys in PostgreSQL
Introduction
A foreign key establishes a relationship between two tables by linking a column (or set of columns) in one table to the PRIMARY KEY or UNIQUE column of another table.
It enforces referential integrity — ensuring that a value in the referencing table must always match an existing value in the referenced table (or be NULL if allowed).
Foreign keys prevent "orphan" records, i.e., rows that refer to non-existent data in another table.
The REFERENCES keyword defines the link to the target table and column.
By default, PostgreSQL enforces that the referenced value must exist in the parent table before it can be inserted into the child table.
Example: Basic Foreign Key Relationship
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
Here:
customers is the parent table.
orders is the child table — each order must refer to a valid customer.
If you try to insert an order for a non-existent customer, PostgreSQL will reject it.
ERROR: insert or update on table "orders" violates foreign key constraint "orders_customer_id_fkey"
DETAIL: Key (customer_id)=(999) is not present in table "customers".
Foreign Key Constraint Naming
PostgreSQL automatically generates a name (e.g. orders_customer_id_fkey), but you can define your own for clarity:
Normally, PostgreSQL checks foreign key integrity immediately when you insert or update a row.
With DEFERRABLE INITIALLY DEFERRED, checks occur only when the transaction commits — allowing temporary inconsistencies within a transaction.
Example use case: cyclic relationships between tables that reference each other.
Checking Existing Foreign Keys
SELECT
conname AS constraint_name,
conrelid::regclass AS table_name,
confrelid::regclass AS referenced_table,
confupdtype AS on_update,
confdeltype AS on_delete
FROM pg_constraint
WHERE contype = 'f';
The pg_constraint system catalog stores all foreign key definitions.
The confdeltype and confupdtype columns contain codes for delete/update actions:
a — NO ACTION
r — RESTRICT
c — CASCADE
n — SET NULL
d — SET DEFAULT
Behavior When Dropping Referenced Tables
DROP TABLE customers;
-- ERROR: cannot drop table customers because other objects depend on it
PostgreSQL blocks the drop to preserve referential integrity.
You can force the removal (and automatically drop all dependent constraints) using CASCADE:
DROP TABLE customers CASCADE;
This will also remove all foreign key constraints in child tables that reference it, but the child tables themselves remain intact.
NULL and Foreign Keys
Foreign key columns can contain NULL unless defined as NOT NULL.
When NULL is present, PostgreSQL skips foreign key validation for that row (because "unknown" cannot be compared).
This allows optional relationships — e.g., an order that may not yet be linked to a customer.
Performance Considerations
PostgreSQL automatically creates indexes on the referenced columns in the parent table (if they are primary or unique keys).
However, you may also create indexes on the foreign key columns in the child table for faster joins and deletions:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
This improves query performance when filtering or joining on the foreign key.
Basics on Joining Tables in PostgreSQL
Introduction
A JOIN combines rows from two or more tables based on a related column between them.
In PostgreSQL, joins are performed using the JOIN keyword together with ON or USING clauses that specify matching columns.
Basic JOIN Syntax
SELECT columns
FROM table1
JOIN table2
ON table1.column_name = table2.column_name;
JOIN matches rows from both tables where the condition is true.
By default, JOIN means INNER JOIN.
Example Setup
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
city TEXT
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE DEFAULT CURRENT_DATE,
amount NUMERIC(10,2)
);
INSERT INTO customers (name, city)
VALUES ('Alice', 'Berlin'),
('Bob', 'Munich'),
('Charlie', 'Hamburg');
INSERT INTO orders (customer_id, amount)
VALUES (1, 120.00),
(1, 90.50),
(2, 45.00);
This setup lets us explore all join types using customers and orders.
INNER JOIN
SELECT
customers.name,
orders.order_id,
orders.amount
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id;
INNER JOIN returns only rows with matching values in both tables.
If a customer has no orders, that customer will not appear in the result.
Result:
name | order_id | amount
--------+-----------+--------
Alice | 1 | 120.00
Alice | 2 | 90.50
Bob | 3 | 45.00
2. LEFT JOIN (or LEFT OUTER JOIN)
SELECT
customers.name,
orders.order_id,
orders.amount
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id;
LEFT JOIN returns all rows from the left table (customers), and the matching rows from the right table (orders).
If no match exists, PostgreSQL fills right-side columns with NULL.
Result:
name | order_id | amount
----------+-----------+--------
Alice | 1 | 120.00
Alice | 2 | 90.50
Bob | 3 | 45.00
Charlie | NULL | NULL
RIGHT JOIN (or RIGHT OUTER JOIN)
SELECT
customers.name,
orders.order_id,
orders.amount
FROM customers
RIGHT JOIN orders
ON customers.customer_id = orders.customer_id;
RIGHT JOIN returns all rows from the right table (orders) and matching rows from the left table (customers).
Unmatched rows from the left table will appear as NULL.
Typically less used than LEFT JOIN because you can just swap the table order.
FULL JOIN (or FULL OUTER JOIN)
SELECT
customers.name,
orders.order_id,
orders.amount
FROM customers
FULL JOIN orders
ON customers.customer_id = orders.customer_id;
FULL JOIN returns all rows from both tables, matching them when possible.
Unmatched rows from either table get NULL values in the missing columns.
Result example if some rows had no counterpart:
name | order_id | amount
----------+-----------+--------
Alice | 1 | 120.00
Alice | 2 | 90.50
Bob | 3 | 45.00
Charlie | NULL | NULL
NULL | 4 | 80.00 -- order without a known customer
CROSS JOIN
SELECT
customers.name,
orders.order_id
FROM customers
CROSS JOIN orders;
CROSS JOIN produces the Cartesian product — every row from the first table is combined with every row from the second.
Use with caution — the number of rows grows rapidly (e.g., 10 × 10 = 100).
This join type does not require an ON condition.
SELF JOIN
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name TEXT,
manager_id INT REFERENCES employees(employee_id)
);
SELECT e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.employee_id;
A self join joins a table to itself — useful for hierarchical or tree-like data such as employees and managers.
Aliases (e, m) are necessary to distinguish between the two instances of the same table.
USING Clause
SELECT *
FROM customers
JOIN orders USING (customer_id);
The USING (column_name) syntax is a shortcut when both tables share a column with the same name.
It automatically merges the duplicate join column in the output, so you'll see only one customer_id.
8. Joining More Than Two Tables
CREATE TABLE order_items (
order_item_id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(order_id),
product_name TEXT,
quantity INT
);
SELECT
c.name AS customer,
o.order_id,
i.product_name,
i.quantity
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
JOIN order_items i
ON o.order_id = i.order_id;
You can chain multiple JOIN clauses in one query.
Each JOIN connects one more table through a matching condition.
Filtering After Joins
SELECT
c.name, o.amount
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
WHERE o.amount > 50;
You can use WHERE, ORDER BY, LIMIT, and other clauses after joining.
PostgreSQL first performs the join, then filters the results according to your conditions.
NATURAL JOIN
SELECT *
FROM customers
NATURAL JOIN orders;
NATURAL JOIN automatically joins tables using all columns with the same names.
While convenient, it can be dangerous — if new columns with the same name are added later, join behavior may change unexpectedly.
Use JOIN ... USING (...) instead for clarity and control.
Performance Considerations
Joins can be computationally expensive for large tables.
To optimize performance:
Ensure both sides of join conditions are indexed (especially foreign key columns).
Use EXPLAIN to analyze query plans and check for sequential scans.
Limit unnecessary joins — fetch only required columns.
EXPLAIN SELECT * FROM customers JOIN orders ON customers.customer_id = orders.customer_id;
This shows how PostgreSQL executes the join internally (e.g., nested loop, hash join, merge join).
Set Operations in PostgreSQL
Introduction
Set operations in PostgreSQL allow you to combine the results of two or more SELECT queries into a single result set.
They work on complete rows (as sets), not on individual columns.
Common operations include:
UNION – combine and remove duplicates.
UNION ALL – combine and keep duplicates.
INTERSECT – return only rows common to both queries.
EXCEPT – return rows from the first query that are not in the second.
Each query in a set operation MUST:
Select the same number of columns.
Have compatible data types for corresponding columns.
Example Setup
CREATE TABLE students_2024 (
name TEXT,
city TEXT
);
CREATE TABLE students_2025 (
name TEXT,
city TEXT
);
INSERT INTO students_2024 VALUES
('Alice', 'Berlin'),
('Bob', 'Munich'),
('Charlie', 'Hamburg');
INSERT INTO students_2025 VALUES
('Alice', 'Berlin'),
('David', 'Stuttgart'),
('Ella', 'Munich');
We now have two tables with overlapping and distinct rows.
1. UNION — Combine and Remove Duplicates
SELECT * FROM students_2024
UNION
SELECT * FROM students_2025;
UNION merges the two result sets and automatically removes duplicates.
The combined result is sorted and distinct by default.
Result:
name | city
------------+------------
Alice | Berlin
Bob | Munich
Charlie | Hamburg
David | Stuttgart
Ella | Munich
2. UNION ALL — Combine and Keep Duplicates
SELECT * FROM students_2024
UNION ALL
SELECT * FROM students_2025;
UNION ALL keeps duplicate rows, without sorting or removing them.
It is faster than UNION because PostgreSQL skips duplicate checking.
Result:
name | city
------------+------------
Alice | Berlin
Bob | Munich
Charlie | Hamburg
Alice | Berlin
David | Stuttgart
Ella | Munich
3. INTERSECT — Common Rows Only
SELECT * FROM students_2024
INTERSECT
SELECT * FROM students_2025;
INTERSECT returns only the rows that appear in both queries.
Duplicates are automatically removed.
Result:
name | city
------------+------------
Alice | Berlin
4. INTERSECT ALL — Common Rows, Keep Duplicates
SELECT * FROM students_2024
INTERSECT ALL
SELECT * FROM students_2025;
INTERSECT ALL keeps duplicates as they appear in both results.
If a row appears twice in both tables, it appears twice in the output.
Since our data has only one matching row, the result is identical to INTERSECT.
5. EXCEPT — Rows in First Query but Not in Second
SELECT * FROM students_2024
EXCEPT
SELECT * FROM students_2025;
EXCEPT returns all rows from the first query that do not appear in the second.
Duplicates are removed automatically.
Result:
name | city
------------+------------
Bob | Munich
Charlie | Hamburg
6. EXCEPT ALL — Keep Duplicates
SELECT * FROM students_2024
EXCEPT ALL
SELECT * FROM students_2025;
EXCEPT ALL behaves like EXCEPT, but keeps duplicates according to their frequency in the first query minus the second.
Example: if a row appears 3 times in the first query and once in the second, it appears twice in the result.
7. Matching Columns and Types
-- This will fail:
SELECT name FROM students_2024
UNION
SELECT name, city FROM students_2025;
All participating queries in a set operation must have:
The same number of columns.
Compatible data types in corresponding positions.
Otherwise PostgreSQL will raise an error:
ERROR: each UNION query must have the same number of columns
8. Sorting Combined Results
SELECT name, city FROM students_2024
UNION
SELECT name, city FROM students_2025
ORDER BY name;
Use ORDER BY only after the final set operation — not inside each SELECT.
PostgreSQL treats the combined output as one large table that can be sorted or filtered further.
9. Using Parentheses for Complex Set Operations
(SELECT * FROM students_2024
UNION
SELECT * FROM students_2025)
EXCEPT
SELECT * FROM expelled_students;
Parentheses control the evaluation order when mixing multiple set operators.
Without parentheses, PostgreSQL evaluates from left to right.
Use parentheses to make complex logic clear and unambiguous.
10. Combining with WHERE or JOIN Clauses
SELECT name, city FROM students_2024 WHERE city = 'Munich'
UNION
SELECT name, city FROM students_2025 WHERE city = 'Munich';
Each SELECT can include its own WHERE, JOIN, or GROUP BY clause before the set operation.
This allows you to filter and process each dataset independently before combining them.
11. Performance Considerations
UNION and INTERSECT remove duplicates — they require sorting or hashing internally, which can be expensive on large data sets.
UNION ALL and EXCEPT ALL skip this step and are generally faster.
When possible, use UNION ALL if you are sure duplicates are not an issue.
EXPLAIN SELECT * FROM students_2024 UNION SELECT * FROM students_2025;
EXPLAIN helps visualize how PostgreSQL performs the merge internally.
The BOOLEAN Type in PostgreSQL
Introduction
The BOOLEAN type in PostgreSQL represents truth values — logical true or false conditions.
It is often used in columns that represent yes/no, on/off, or active/inactive states.
Unlike databases that simulate booleans with integers (e.g. 0/1), PostgreSQL has a native boolean type with its own rules and accepted literals.
Declaration Syntax
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
active BOOLEAN DEFAULT TRUE
);
BOOLEAN columns can take values TRUE, FALSE, or NULL.
Accepted Literal Values
PostgreSQL accepts a variety of textual and numeric representations for boolean values:
Meaning
Accepted Values
TRUE
TRUE, 't', 'true', 'y', 'yes', 'on', '1'
FALSE
FALSE, 'f', 'false', 'n', 'no', 'off', '0'
The matching is case-insensitive, so 'True', 'YES', and 'On' all work the same.
Examples of Insertion
INSERT INTO users (name, active)
VALUES
('Alice', TRUE),
('Bob', 'yes'),
('Charlie', 'f'),
('Diana', 0);
SELECT * FROM users;
PostgreSQL automatically interprets 'yes', 'f', and 0 according to the truth table above.
Result:
user_id | name | active
----------+----------+---------
1 | Alice | t
2 | Bob | t
3 | Charlie | f
4 | Diana | f
t and f are how PostgreSQL displays boolean values by default.
Boolean Display Formats
Although PostgreSQL stores booleans internally as true/false, by default it displays them as t and f.
This is a historical choice compatible with early UNIX and SQL standards.
You can cast or format them differently if needed:
SELECT active::text FROM users;
-- Outputs: 't', 'f'
SELECT
CASE
WHEN active THEN 'Active'
ELSE 'Inactive'
END
FROM users;
-- Custom labels for boolean output
Using Booleans in Conditions
SELECT * FROM users WHERE active = TRUE;
-- Shorter equivalent:
SELECT * FROM users WHERE active;
-- Opposite:
SELECT * FROM users WHERE NOT active;
You can write conditions with or without explicitly comparing to TRUE or FALSE.
The expression WHERE active is shorthand for WHERE active = TRUE.
NULL with Boolean Columns
INSERT INTO users (name, active) VALUES ('Eve', NULL);
SELECT name,
CASE
WHEN active THEN 'True'
WHEN NOT active THEN 'False'
ELSE 'Unknown'
END AS status
FROM users;
Boolean columns can be NULL, representing "unknown" or "not specified."
This is different from FALSE — PostgreSQL treats NULL as neither true nor false.
The CASE statement (or COALESCE) can be used to handle NULL explicitly.
Boolean Operators
PostgreSQL supports standard logical operators on boolean expressions:
Operator
Description
Example
Result
AND
True if both operands are true
TRUE AND FALSE
FALSE
OR
True if at least one operand is true
TRUE OR FALSE
TRUE
NOT
Negates the boolean value
NOT TRUE
FALSE
Example usage inside a query:
SELECT * FROM users
WHERE active OR name = 'Alice';
Boolean Expressions in SELECT
SELECT name,
(active AND city = 'Berlin') AS active_in_berlin
FROM users;
Boolean expressions can appear directly in the SELECT list — returning t, f, or NULL per row.
This is useful for logical flags or derived states.
Indexing Boolean Columns
Boolean columns typically have low selectivity (few distinct values), so simple indexes are often not useful.
However, you can create partial indexes to speed up queries on a specific boolean value:
CREATE INDEX idx_users_active_true
ON users (user_id)
WHERE active = TRUE;
This makes WHERE active = TRUE queries faster without indexing inactive users.
Converting Between Boolean and Other Types
SELECT CAST(TRUE AS INT); -- returns 1
SELECT CAST(FALSE AS INT); -- returns 0
SELECT CAST(1 AS BOOLEAN); -- returns TRUE
SELECT CAST(0 AS BOOLEAN); -- returns FALSE
You can cast between BOOLEAN and INTEGER types if needed.
PostgreSQL follows standard semantics: TRUE → 1, FALSE → 0.
Character Types: CHAR, VARCHAR, and TEXT in PostgreSQL
Introduction
PostgreSQL provides several types for storing textual data: CHAR, VARCHAR, and TEXT.
Although they appear similar, they differ in how they handle length, padding, and validation.
All three are stored efficiently as variable-length strings internally, but their behavior and use cases differ slightly.
CHAR(n) — Fixed-Length Character Type
CREATE TABLE products_char (
code CHAR(5),
name TEXT
);
INSERT INTO products_char VALUES ('AB', 'Widget');
SELECT code, LENGTH(code) FROM products_char;
CHAR(n) (or CHARACTER(n)) stores exactly n characters.
If you insert a shorter string, PostgreSQL pads it with spaces to the fixed length.
If you insert a longer string, it is truncated.
Result:
code | length
-------+--------
AB | 5
Notice how 'AB' was stored as 'AB ' (with padding).
Useful when strings must have uniform length (e.g., fixed codes, legacy systems).
VARCHAR(n) — Variable-Length with Limit
CREATE TABLE products_varchar (
code VARCHAR(5),
name TEXT
);
INSERT INTO products_varchar VALUES ('AB', 'Widget');
INSERT INTO products_varchar VALUES ('ABCDE', 'Gadget');
VARCHAR(n) (or CHARACTER VARYING(n)) stores a string upto n characters long.
It does not pad shorter strings with spaces.
If the input exceeds n, PostgreSQL raises an error:
ERROR: value too long for type character varying(5)
This type is ideal when you need a length restriction (e.g., usernames ≤ 50 characters).
TEXT — Unlimited-Length Variable Text
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT
);
INSERT INTO articles (title, content)
VALUES ('Intro to PostgreSQL', 'PostgreSQL is a powerful open-source RDBMS...');
TEXT stores strings of any length (up to about 1 GB per field).
It does not require you to specify a limit and does not pad or truncate values.
Functionally equivalent to VARCHAR without a length limit — same performance, same behavior.
Comparison Between CHAR, VARCHAR, and TEXT
Type
Behavior
Length Limit
Padded?
Use Case
CHAR(n)
Fixed-length
Yes
Yes (spaces)
Fixed-size codes, legacy data
VARCHAR(n)
Variable-length, validated
Yes
No
Text with known maximum length (e.g., names)
TEXT
Variable-length, unlimited
No
No
General-purpose text, documents, comments
CHAR vs VARCHAR vs TEXT in Practice
CREATE TABLE demo (
c1 CHAR(5),
c2 VARCHAR(5),
c3 TEXT
);
INSERT INTO demo VALUES ('AB', 'AB', 'AB');
SELECT c1, c2, c3, LENGTH(c1), LENGTH(c2), LENGTH(c3) FROM demo;
Result:
c1 | c2 | c3 | length
---------+----+----+--------
AB | AB | AB | 5 2 2
CHAR(5) pads to 5 characters, while VARCHAR and TEXT store only what you insert.
Performance and Storage
Internally, PostgreSQL stores CHAR, VARCHAR, and TEXT using the same variable-length storage format (varlena).
There is no performance difference between TEXT and VARCHAR(n) for most workloads.
The only difference lies in input validation and space padding behavior — not storage or speed.
PostgreSQL will automatically select appropriate storage and enforce the numeric range for each column.
If you try to insert a value beyond the range, it raises an error:
ERROR: integer out of range
Integer Aliases and Compatibility
INT and INTEGER are exact synonyms.
INT2, INT4, and INT8 are internal aliases for SMALLINT, INTEGER, and BIGINT.
You can use any version interchangeably — all mean the same data type.
Auto-Incrementing Integers
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
Internally, PostgreSQL expands this into:
user_id INTEGER NOT NULL DEFAULT nextval('users_user_id_seq'::regclass);
In modern PostgreSQL, the preferred syntax is the identity column:
CREATE TABLE users (
user_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT
);
GENERATED ALWAYS AS IDENTITY is SQL-standard and avoids manually managing sequences.
Working with Sequences
-- Create a table with SERIAL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
amount NUMERIC(10,2)
);
-- Check the associated sequence
\d orders
-- Shows: "order_id" integer DEFAULT nextval('orders_order_id_seq'::regclass)
-- Get the next value manually:
SELECT nextval('orders_order_id_seq');
Each SERIAL or IDENTITY column uses a dedicated sequence object for generating new values.
You can control sequences directly using functions like nextval(), currval(), and setval().
Arithmetic and Overflow
SELECT 2147483647 + 1; -- overflow for INTEGER
SELECT 9223372036854775807 + 1; -- overflow for BIGINT
PostgreSQL detects overflows and raises an error — it does not wrap around silently.
To handle large numeric computations safely, use NUMERIC (arbitrary precision).
Type Casting
SELECT '42'::INT; -- 42
SELECT 3.7::INT; -- 4 truncated to 3
SELECT 3.7::BIGINT; -- also truncates fractional part
PostgreSQL automatically casts compatible numeric and string literals to integer types when possible.
Fractions are truncated (not rounded).
Comparison and Sorting
SELECT * FROM inventory ORDER BY quantity DESC;
Integers can be sorted and compared normally with >, <, =, etc.
Indexes on integer columns are highly efficient, making them ideal for primary keys and lookup columns.
Converting Between Integer Types
SELECT 500::SMALLINT; -- OK
SELECT 40000::SMALLINT; -- ERROR: smallint out of range
Conversions between integer types work as long as the value fits in the target range.
If it doesn't, PostgreSQL throws an overflow error.
Performance Tips
Use SMALLINT only when saving space truly matters, otherwise INTEGER is more convenient.
BIGINT should be used for IDs or counters that may exceed 2 billion (e.g., event logs, user IDs in large systems).
All integer types are fast to compare, sort, and index, since they are stored as fixed-size binary numbers.
The DECIMAL Type in PostgreSQL
Introduction
DECIMAL (or its synonym NUMERIC) is a fixed-point, exact numeric data type in PostgreSQL.
It stores numbers with user-defined precision and scale, ensuring no rounding errors in arithmetic operations.
This makes it ideal for use in financial, accounting, and scientific calculations where accuracy matters more than speed.
Declaration Syntax
NUMERIC(precision, scale)
-- or
DECIMAL(precision, scale)
precision: total number of significant digits (before + after decimal point).
scale: number of digits allowed after the decimal point.
For example: NUMERIC(8, 2) means:
Up to 8 digits total.
2 digits after the decimal (so max 999999.99).
If you omit (p, s), PostgreSQL allows arbitrary precision — only limited by memory.
Example: Defining DECIMAL Columns
CREATE TABLE transactions (
id GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
amount DECIMAL(10, 2),
tax_rate NUMERIC(5, 3),
total NUMERIC(12, 4)
);
INSERT INTO transactions (amount, tax_rate, total)
VALUES (1234.56, 0.075, 1326.9300);
DECIMAL(10, 2) — up to 10 digits total, with 2 after the decimal (e.g., 99999999.99).
NUMERIC(5, 3) — up to 5 digits total, 3 of them after the decimal (e.g., 12.345).
NUMERIC(12, 4) — higher precision for cumulative totals.
Precision and Rounding
CREATE TABLE test_round (
value NUMERIC(6, 2)
);
INSERT INTO test_round VALUES (123.456); -- Automatically rounded
SELECT * FROM test_round;
PostgreSQL rounds values to the defined scale when inserting or updating.
Result:
value
--------
123.46
If a number exceeds the allowed precision, PostgreSQL raises an error instead of truncating silently.
ERROR: numeric field overflow
DETAIL: A field with precision 6, scale 2 must round to an absolute value less than 10^4.
Comparison with Floating-Point Types
SELECT
0.1 + 0.2 AS float_sum,
CAST(0.1 AS NUMERIC) + CAST(0.2 AS NUMERIC) AS numeric_sum;
Floating-point (REAL/DOUBLE) types use binary fractions and may produce tiny rounding errors:
NUMERIC / DECIMAL performs exact arithmetic and never introduces such artifacts.
Therefore, prefer DECIMAL for money, interest rates, or measurements requiring exactness.
Default Precision and Scale
CREATE TABLE flexible_numbers (
value NUMERIC
);
INSERT INTO flexible_numbers VALUES (12345678901234567890.123456789);
SELECT value FROM flexible_numbers;
If no precision and scale are specified, NUMERIC stores numbers with arbitrary length and precision.
This allows very large or very precise numbers, but may be slightly slower to process.
Output example:
value
---------------------
12345678901234567890.123456789
Arithmetic Operations
SELECT
amount,
tax_rate,
amount * tax_rate AS tax,
amount + (amount * tax_rate) AS total
FROM transactions;
NUMERIC values support all arithmetic operators: +, -, *, /.
PostgreSQL maintains full precision in intermediate steps and rounds only at output or storage.
TIMESTAMP WITH TIME ZONE can also be written as TIMESTAMPTZ (shorthand alias).
DATE Type
CREATE TABLE events (
id GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title TEXT,
event_date DATE
);
INSERT INTO events (title, event_date)
VALUES
('Conference', '2025-12-01'),
('Workshop', CURRENT_DATE);
DATE stores only the year, month, and day.
CURRENT_DATE returns the system's current date.
Arithmetic is supported:
SELECT
event_date + INTERVAL '7 days' AS next_week
FROM events;
TIME Type
CREATE TABLE shifts (
employee TEXT,
start_time TIME,
end_time TIME
);
INSERT INTO shifts
VALUES
('Alice', '09:00', '17:30'),
('Bob', '22:00', '06:00');
TIME stores the hour, minute, and second (optionally with microseconds).
It is independent of any date or timezone.
You can compute time differences as intervals:
SELECT
end_time - start_time AS duration
FROM shifts;
TIME WITH TIME ZONE (TIMETZ)
CREATE TABLE broadcasts (
show_name TEXT,
air_time TIME WITH TIME ZONE
);
INSERT INTO broadcasts
VALUES
('Morning News', '07:00+01'),
('Night Talk', '23:00-05');
TIME WITH TIME ZONE (or TIMETZ) stores a local time and its associated timezone offset.
Rarely used in modern applications, since most real-world scheduling uses full timestamps instead.
TIMESTAMP records both date and time but has no timezone awareness.
CURRENT_TIMESTAMP returns the local date and time when the row is inserted.
Useful when you want consistent "local" timestamps (e.g., system logs).
TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ)
CREATE TABLE meetings (
meeting_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title TEXT,
start_at TIMESTAMPTZ DEFAULT NOW()
);
INSERT INTO meetings (title, start_at) VALUES ('Team Sync', '2025-11-11 09:00:00+01');
TIMESTAMPTZ automatically normalizes values to UTC internally.
When displayed, PostgreSQL converts them to the current session timezone (SHOW TIMEZONE;).
This ensures consistent handling of international data and daylight saving changes.
SET TIMEZONE TO 'UTC';
SELECT start_at FROM meetings;
SET TIMEZONE TO 'America/New_York';
SELECT start_at FROM meetings;
The same instant in time displays differently under different timezones.
INTERVAL Type
CREATE TABLE rentals (
rental_id SERIAL PRIMARY KEY,
start_date TIMESTAMP,
duration INTERVAL
);
INSERT INTO rentals VALUES (1, '2025-11-11 10:00:00', INTERVAL '3 days 4 hours');
SELECT
start_date + duration AS return_date
FROM rentals;
INTERVAL represents a time duration or span, not a specific point in time.
You can add or subtract intervals to/from timestamps or dates.
Intervals can include multiple components (years, months, days, hours, minutes, seconds).
Extracting Date and Time Parts
SELECT
EXTRACT(YEAR FROM start_at) AS year,
EXTRACT(MONTH FROM start_at) AS month,
EXTRACT(DAY FROM start_at) AS day,
EXTRACT(HOUR FROM start_at) AS hour
FROM meetings;
EXTRACT() retrieves individual components from a date or timestamp.
Other functions like DATE_PART() and TO_CHAR() provide similar functionality with formatting control.
SELECT TO_CHAR(start_at, 'YYYY-MM-DD HH24:MI');
Date and Time Arithmetic
SELECT
NOW() AS now,
NOW() + INTERVAL '7 days' AS next_week,
NOW() - INTERVAL '1 month' AS last_month,
AGE('2025-12-31', '2025-01-01') AS difference;
+ and - can be used with intervals for addition and subtraction.
AGE() returns a human-readable interval difference between two timestamps.
Useful Built-in Functions
Function
Description
NOW()
Current date and time (same as CURRENT_TIMESTAMP)
CURRENT_DATE
Current date (no time)
CURRENT_TIME
Current time of day
LOCALTIME / LOCALTIMESTAMP
Local time/date-time without timezone conversion
AGE(timestamp1, timestamp2)
Difference between two timestamps as interval
DATE_TRUNC('field', source)
Truncates a timestamp to the specified precision (e.g., hour, day, month)
SELECT DATE_TRUNC('month', NOW()); -- first day of current month
Converting Between Date/Time Types
SELECT
'2025-11-11'::DATE + INTERVAL '1 day', -- date → timestamp
'13:45:00'::TIME + INTERVAL '2 hours', -- time + interval
NOW()::DATE AS current_day, -- timestamp → date
CAST('2025-11-11 09:00' AS TIMESTAMPTZ);
The UUID Type in PostgreSQL
Introduction
UUID stands for Universally Unique Identifier.
It is a 128-bit value used to uniquely identify rows, records, or entities across systems, databases, and even time — without relying on a central sequence generator.
PostgreSQL natively supports UUID as a built-in data type, compliant with RFC 4122.
UUIDs are often used as primary keys in distributed or large-scale applications where integer IDs might collide or reveal sequence information.
UUID Format
A UUID is typically written as a 36-character string in hexadecimal form, separated by hyphens:
550e8400-e29b-41d4-a716-446655440000
Internally, PostgreSQL stores it efficiently as a 16-byte binary value (not as text).
UUIDs are case-insensitive, 'A23B...' and 'a23b...' represent the same value.
Creating a Table with UUID Column
CREATE TABLE users (
user_id UUID PRIMARY KEY,
name TEXT NOT NULL
);
INSERT INTO users (user_id, name)
VALUES ('550e8400-e29b-41d4-a716-446655440000', 'Alice');
UUID columns can be manually assigned, or automatically generated using PostgreSQL's functions.
Each UUID guarantees uniqueness across space and time with an extremely low probability of collision.
Generating UUIDs Automatically
PostgreSQL provides UUID generation through the uuid-ossp and pgcrypto extensions.
To enable them:
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
Then you can generate UUIDs using built-in functions:
-- Using uuid-ossp:
SELECT uuid_generate_v1(); -- based on MAC + timestamp
SELECT uuid_generate_v4(); -- random-based (most common)
-- Using pgcrypto:
SELECT gen_random_uuid(); -- random version 4 UUID (faster, preferred)
Modern PostgreSQL versions (≥13) recommend gen_random_uuid() from pgcrypto for simplicity and performance.
Here, gen_random_uuid() generates a new unique identifier automatically when no ID is provided.
This eliminates the need for numeric sequences and prevents ID collisions across distributed systems.
UUID-based keys are also non-sequential — they don't reveal record creation order or count.
UUID Versions Overview
Version
Source
Function (uuid-ossp)
Use Case
v1
Timestamp + MAC address
uuid_generate_v1()
Predictable, contains machine/time info (not recommended for public use)
v3
Namespace + MD5 hash
uuid_generate_v3(namespace, name)
Deterministic generation from names
v4
Random numbers
uuid_generate_v4() or gen_random_uuid()
Most common, fast, and secure for general use
v5
Namespace + SHA-1 hash
uuid_generate_v5(namespace, name)
Deterministic but safer than v3
Version 4 (random) UUIDs are the default choice for new projects.
Versions 3 and 5 are used when deterministic IDs are needed for given input data.
Comparing UUID vs SERIAL IDs
Aspect
SERIAL / IDENTITY
UUID
Uniqueness
Unique within a single database sequence
Globally unique across databases and servers
Predictability
Sequential, reveals insertion order
Non-sequential, unpredictable
Storage Size
4–8 bytes
16 bytes
Indexing Speed
Faster (smaller, sequential)
Slightly slower (larger, random order)
Replication / Merging
May collide between instances
Safe across distributed databases
Use Case
Simple local applications
Microservices, APIs, distributed systems
While UUID keys consume more storage and index space, their global uniqueness makes them ideal for modern scalable systems.
Querying UUID Columns
SELECT * FROM api_tokens
WHERE token_id = 'b36d6d3e-b41a-4d0c-b4e4-9985a93f79a7';
-- You can also use functions:
SELECT token_id::TEXT FROM api_tokens;
You can compare, cast, or index UUID columns just like integers.
UUID comparisons are efficient and type-safe — no need for string matching.
Indexing and Performance
CREATE INDEX idx_api_tokens_user_id ON api_tokens(user_id);
Indexing UUID columns works exactly like for integers.
However, because UUIDs are random, inserts can fragment B-tree indexes, slightly reducing performance for very large tables.
To optimize insert speed, consider using uuid_generate_v1mc() (MAC/time-ordered UUID) or PostgreSQL's pgcrypto functions with custom ordering strategies.
Common Table Expressions (CTE) in PostgreSQL
Introduction
A Common Table Expression (CTE) is a temporary, named result set defined within a query using the WITH clause.
CTEs make SQL queries easier to read, maintain, and debug, especially when dealing with subqueries or recursive data.
They exist ONLY for the duration of the SINGLE SQL statement and are not stored as permanent tables.
CTEs can be:
Non-recursive: similar to a named subquery.
Recursive: used to query hierarchical or tree-like data.
Basic Syntax
WITH cte_name (optional_columns) AS (
SQL_query
)
SELECT ...
FROM cte_name;
cte_name — the name of your temporary result set.
optional_columns — optional list of column names for the CTE result.
SQL_query — a normal SELECT (or even INSERT / UPDATE / DELETE) that defines the data.
Example: Basic Non-Recursive CTE
WITH top_sellers AS (
SELECT
seller_id,
SUM(amount) AS total_sales
FROM orders
GROUP BY seller_id
HAVING SUM(amount) > 5000
)
SELECT
s.seller_id,
s.total_sales,
u.name
FROM top_sellers AS s
JOIN users AS u ON u.id = s.seller_id;
The inner query creates a top_sellers CTE containing only sellers with total sales above 5000.
The main query then joins that result with users to retrieve seller details.
This improves readability compared to deeply nested subqueries.
Using Multiple CTEs
WITH
orders_per_customer AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
),
top_customers AS (
SELECT customer_id
FROM orders_per_customer
WHERE order_count > 10
)
SELECT c.name, opc.order_count
FROM top_customers tc
JOIN orders_per_customer opc ON tc.customer_id = opc.customer_id
JOIN customers c ON c.id = tc.customer_id;
You can define multiple CTEs in a single WITH clause — each separated by commas.
Later CTEs can reference earlier ones within the same statement.
Modifying Data with CTEs
WITH updated AS (
UPDATE products
SET price = price * 1.10
WHERE category = 'electronics'
RETURNING id, price
)
SELECT * FROM updated;
CTEs can be used with INSERT, UPDATE, or DELETE statements that return data using RETURNING.
This pattern is powerful for performing modifications and immediately reading back affected rows.
Recursive CTEs
Recursive CTEs are used to handle hierarchical data such as organizational charts, category trees, or dependency chains.
The general form consists of two parts:
Anchor member — the base query (starting rows).
Recursive member — a query that references the CTE itself, used to fetch related rows iteratively.
WITH RECURSIVE subordinates AS (
-- Anchor: start from the manager
SELECT employee_id, name, manager_id
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive step: find employees reporting to those above
SELECT e.employee_id, e.name, e.manager_id
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;
This query builds an entire organization hierarchy starting from the top-level manager.
The recursion continues until no new rows are produced.
UNION ALL is used (instead of UNION) to include all recursive results efficiently.
Example: Folder Hierarchy
CREATE TABLE folders (
id SERIAL PRIMARY KEY,
name TEXT,
parent_id INT REFERENCES folders(id)
);
WITH RECURSIVE folder_tree AS (
SELECT id, name, parent_id, 1 AS depth
FROM folders
WHERE parent_id IS NULL
UNION ALL
SELECT f.id, f.name, f.parent_id, ft.depth + 1
FROM folders f
JOIN folder_tree ft ON f.parent_id = ft.id
)
SELECT * FROM folder_tree ORDER BY depth;
This recursive CTE prints all folders in a tree, showing their depth levels.
Very useful for hierarchical structures like menus, file systems, and organizational charts.
Performance Considerations
CTEs behave like optimization fences in older PostgreSQL versions (≤12):
The CTE is fully materialized before the main query executes.
This means it is computed once and stored temporarily, which may hurt performance for very large datasets.
In PostgreSQL ≥13, inlinable CTEs were introduced, non-recursive CTEs can now be automatically inlined by the optimizer if safe.
To enforce materialization (useful for debugging or when side-effects matter):
WITH MATERIALIZED temp_data AS (...)
SELECT * FROM temp_data;
Or, to explicitly allow inlining for better performance:
WITH NOT MATERIALIZED temp_data AS (...)
SELECT * FROM temp_data;
Views in PostgreSQL
Introduction
A VIEW in PostgreSQL is a virtual table defined by a SQL query.
It behaves like a table but does not store data physically, instead, it runs the underlying query each time the view is accessed.
Views simplify complex queries, improve code readability, and help enforce security by exposing only specific columns or rows.
PostgreSQL also supports materialized views, which store the result physically for faster reads.
Basic Syntax
CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;
view_name — the name of the virtual table.
The view can then be queried like any normal table using SELECT.
Example: Creating a Simple View
CREATE VIEW active_users AS
SELECT id, name, email
FROM users
WHERE is_active = true;
SELECT * FROM active_users;
This view returns only active users without requiring the filter in every query.
Views are dynamically updated, changes in underlying tables appear automatically.
Using Views to Simplify Complex Queries
CREATE VIEW monthly_sales AS
SELECT
seller_id,
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS total_sales
FROM orders
GROUP BY seller_id, month;
SELECT * FROM monthly_sales
WHERE total_sales > 5000;
Views reduce duplication of complex logic (aggregations, joins, calculations).
Applications can query the view instead of repeating long SQL statements.
Updating Data Through Views
Some views can be updatable, which means you can run INSERT, UPDATE, or DELETE on them.
A view is updatable when it maps directly to a single table without set operations or aggregates.
CREATE VIEW user_public_info AS
SELECT id, name, email
FROM users;
UPDATE user_public_info
SET name = 'Alice Cooper'
WHERE id = 5;
This modifies data in the underlying users table.
Views involving joins, GROUP BY, DISTINCT, UNION, etc., are usually not updatable.
Security and Access Control
Views are often used to restrict sensitive data.
You can grant access to the view while keeping the underlying tables private.
CREATE VIEW user_safe_view AS
SELECT id, name
FROM users;
GRANT SELECT ON user_safe_view TO analyst_role;
REVOKE SELECT ON users FROM analyst_role;
This ensures the role can only see the non-sensitive columns.
Replacing or Dropping a View
DROP VIEW IF EXISTS view_name;
CREATE OR REPLACE VIEW view_name AS
SELECT ...
FROM ...;
CREATE OR REPLACE lets you modify the view without dropping permissions.
Dropping a view does not affect the underlying data.
Materialized Views
A materialized view stores the query result physically on disk.
Useful for performance when the underlying query is heavy (joins, aggregations, scanning large tables).
But unlike normal views, materialized views must be manually refreshed.
CREATE MATERIALIZED VIEW sales_summary AS
SELECT seller_id, SUM(amount) AS total_sales
FROM orders
GROUP BY seller_id;
-- Read from the materialized view
SELECT * FROM sales_summary;
-- Refresh when data changes:
REFRESH MATERIALIZED VIEW sales_summary;
Materialized views behave like cached query results.
They significantly speed up analytics on large datasets.
Materialized View with Indexes
CREATE MATERIALIZED VIEW monthly_stats AS
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS total_users
FROM users
GROUP BY month;
CREATE INDEX idx_monthly_stats_month ON monthly_stats(month);
Indexes on materialized views improve read performance just like on regular tables.
Advanced: Refreshing Concurrently
To refresh a materialized view without blocking reads, PostgreSQL provides:
Allows reads during the refresh — ideal for production systems.
Materialized Views in PostgreSQL
Introduction
A Materialized View (MV) is a database object that stores the physical result of a SQL query.
Unlike normal views (which run the underlying query every time), materialized views improve performance by caching the data.
Materialized views are ideal for heavy analytical queries, aggregations, and large datasets.
They must be manually refreshed to reflect changes in underlying tables.
PostgreSQL supports both normal refresh and CONCURRENT refresh, which avoids blocking reads.
Basic Syntax
CREATE MATERIALIZED VIEW view_name AS
SELECT ...
FROM ...
WHERE ...;
The materialized view stores the result of the SELECT at creation time.
The data inside the MV does NOT automatically update, so refresh is required.
Example: Creating a Materialized View
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
seller_id,
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS total_sales
FROM orders
GROUP BY seller_id, month;
SELECT * FROM monthly_sales;
Query results are stored physically and retrieved instantly.
No recomputation happens until a refresh is triggered.
Refreshing Materialized Views
To update the stored data, call REFRESH MATERIALIZED VIEW:
REFRESH MATERIALIZED VIEW monthly_sales;
This command locks the MV for reads and writes until refresh is complete.
Use it when downtime during refresh is acceptable.
Allows users to still query the MV during refresh.
Requires: A unique index on the MV.
CREATE UNIQUE INDEX idx_monthly_sales_unique ON monthly_sales(seller_id, month);
Use concurrent refresh for production systems where availability matters.
Materialized Views vs Normal Views
Feature
Normal View
Materialized View
Storage
No physical storage
Stores data physically
Performance
Runs query every time (slow for heavy logic)
Fast reads, cached data
Freshness
Always up-to-date
Must be refreshed manually
Use Case
Simple abstractions
Heavy queries, analytics
Blocking
No refresh needed
Refresh blocks reads unless CONCURRENTLY
Materialized Views with Indexes
CREATE MATERIALIZED VIEW user_summary AS
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
CREATE INDEX idx_user_summary_country
ON user_summary(country);
Indexes on materialized views speed up SELECT queries.
Indexes persist across refresh operations.
Manually Refreshing Only When Needed
Materialized views are often refreshed:
Nightly (cron job)
Every hour for dashboard updates
On-demand when underlying tables change
After ETL/ELT batch loading
PostgreSQL does not auto-refresh — applications decide the refresh schedule.
Dropping a Materialized View
DROP MATERIALIZED VIEW IF EXISTS view_name;
Dropping an MV does not affect underlying tables.
Indexes on the MV are dropped automatically.
Advanced: Materialized Views with Joins
CREATE MATERIALIZED VIEW product_stats AS
SELECT
p.id AS product_id,
p.name,
COUNT(o.id) AS orders_count,
SUM(o.amount) AS total_revenue
FROM products p
LEFT JOIN orders o ON o.product_id = p.id
GROUP BY p.id, p.name;
-- Refresh when needed
REFRESH MATERIALIZED VIEW product_stats;
Materialized views shine when performing expensive joins repeatedly.
Advanced: Materialized View with WHERE Filters
CREATE MATERIALIZED VIEW recent_orders AS
SELECT *
FROM orders
WHERE order_date > NOW() - INTERVAL '30 days';
Useful for windowed datasets (recent data, daily snapshots, etc.).
Can be indexed for faster reporting.
The CASE Expression in PostgreSQL
Introduction
The CASE expression in PostgreSQL allows you to perform conditional logic directly inside SQL queries.
It works similarly to if-else statements in programming languages.
CASE returns different values depending on evaluated conditions.
General Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE default_result
END
WHEN clauses are evaluated from top to bottom.
The first true condition determines the returned value.
ELSE is optional; without it, missing matches return NULL.
Simple CASE Expression
Compares a single expression to several possible values.
CASE status
WHEN 'A' THEN 'Active'
WHEN 'I' THEN 'Inactive'
ELSE 'Unknown'
END
Useful when conditions depend on equality comparisons.
Searched CASE Expression
Allows complex logical conditions.
Conditions are evaluated independently.
CASE
WHEN amount > 1000 THEN 'High'
WHEN amount BETWEEN 500 AND 1000 THEN 'Medium'
WHEN amount > 0 THEN 'Low'
ELSE 'None'
END
This form is more flexible and widely used.
Example: Categorizing Orders
SELECT
order_id,
amount,
CASE
WHEN amount > 1000 THEN 'VIP Order'
WHEN amount > 100 THEN 'Standard Order'
ELSE 'Small Order'
END AS order_type
FROM orders;
This query adds a derived column order_type based on order amount.
Using CASE in ORDER BY
SELECT name, role
FROM employees
ORDER BY
CASE
WHEN role = 'manager' THEN 1
WHEN role = 'developer' THEN 2
ELSE 3
END;
CASE allows custom sorting based on business logic.
Here, managers appear first, then developers, then everyone else.
Using CASE in Aggregation
SELECT
seller_id,
SUM(CASE WHEN amount > 500 THEN 1 ELSE 0 END) AS large_orders,
SUM(CASE WHEN amount <= 500 THEN 1 ELSE 0 END) AS small_orders
FROM orders
GROUP BY seller_id;
Conditional aggregation is a powerful pattern in reporting and analytics.
Counts rows differently depending on CASE conditions.
Handling NULL with CASE
SELECT
name,
CASE
WHEN phone IS NULL THEN 'No phone'
ELSE phone
END AS phone_display
FROM users;
CASE is often used as an alternative to COALESCE.
CASE with Multiple Conditions
SELECT
product,
stock,
CASE
WHEN stock = 0 THEN 'Out of Stock'
WHEN stock <= 10 THEN 'Low Stock'
WHEN stock <= 50 THEN 'In Stock'
ELSE 'High Stock'
END AS stock_level
FROM products;
Conditions are checked in order, so the order of WHEN clauses matters.
Using CASE to Avoid Division by Zero
SELECT
seller_id,
total_sales,
total_orders,
CASE
WHEN total_orders = 0 THEN 0
ELSE total_sales / total_orders
END AS avg_order_value
FROM seller_stats;
CASE can gracefully guard against runtime errors.
Nested CASE Expressions
SELECT
name,
salary,
CASE
WHEN salary > 70000 THEN 'Senior'
WHEN salary > 40000 THEN
CASE
WHEN salary > 55000 THEN 'Mid-High'
ELSE 'Mid'
END
ELSE 'Junior'
END AS level
FROM employees;
Nested CASE blocks allow complex conditional transformations.
CASE Expression Without ELSE
CASE
WHEN score >= 90 THEN 'A'
WHEN score >= 80 THEN 'B'
END
If no condition matches, result is NULL.
For safety, include ELSE in most production SQL.
Performance Considerations
The CASE expression is fast, it is evaluated per row and does not require special indexing.
However, CASE inside:
ORDER BY
WHERE
JOIN conditions
can affect query performance because they prevent index-only operations.
Use CASE mainly in SELECT or for computed fields when possible.
The COALESCE Function in PostgreSQL
Introduction
COALESCE is a SQL function that returns the first non-NULL value in a list of expressions.
It is commonly used to handle NULL values, provide defaults, clean data, or avoid NULL-related errors.
PostgreSQL evaluates expressions from left to right and returns the earliest non-NULL result.
COALESCE is equivalent to a simple CASE expression but is shorter and more readable.
Syntax
COALESCE(value1, value2, value3, ...)
value1, value2, value3, ... are evaluated in order.
The first value that is not NULL is returned.
If all values are NULL, the result is NULL.
Basic Example
SELECT COALESCE(NULL, NULL, 'Hello', 'World');
Returns 'Hello', since it is the first non-NULL value.
Using COALESCE to Replace NULL With a Default
SELECT
name,
COALESCE(phone, 'No phone number') AS phone_display
FROM users;
If phone is NULL, it displays a default string instead.
Useful for UI, reports, and exported data.
Using COALESCE With Numeric Values
SELECT
price,
COALESCE(discount_price, price) AS effective_price
FROM products;
If discount_price is NULL, fallback to original price.
Prevents NULL from propagating into calculations.
Using COALESCE in Calculations
SELECT
COALESCE(total_sales, 0) / COALESCE(order_count, 1) AS avg_order_value
FROM seller_stats;
Prevents division by NULL or NULL arithmetic results.
Often used in analytics pipelines.
COALESCE With Multiple Alternatives
SELECT
COALESCE(username, email, phone, 'Unknown User') AS identifier
FROM users;
The first available non-NULL contact method is selected.
If all fields are NULL, 'Unknown User' is returned.
COALESCE vs CASE Expression
CASE
WHEN value1 IS NOT NULL THEN value1
WHEN value2 IS NOT NULL THEN value2
ELSE value3
END
This CASE expression is equivalent to:
COALESCE(value1, value2, value3)
COALESCE is simpler and preferred for NULL-handling.
Using COALESCE With Aggregations
SELECT
seller_id,
COALESCE(SUM(amount), 0) AS total_sales
FROM orders
GROUP BY seller_id;
SUM() returns NULL when no rows exist.
COALESCE ensures numeric aggregates have meaningful defaults.
Handling NULL Dates
SELECT
name,
COALESCE(last_login, NOW()) AS effective_last_login
FROM users;
If a user never logged in, use the current timestamp.
Useful for audits and logs.
COALESCE in ORDER BY
SELECT title
FROM books
ORDER BY COALESCE(published_year, 0);
Allows ordering while gracefully handling NULL values.
NULL is usually treated as larger than all non-NULL values unless COALESCE changes it.
COALESCE in JOIN Conditions
SELECT *
FROM orders o
JOIN customers c
ON COALESCE(o.customer_id, 0) = c.id;
Can be used to provide fallbacks in join logic.
Be careful: this may prevent index usage, lowering performance.
COALESCE vs NULLIF
COALESCE returns the first non-NULL value.
NULLIF(a, b) returns NULL if a = b, otherwise returns a.
SELECT
COALESCE(NULLIF(status, ''), 'unknown') AS clean_status
FROM tasks;
NULLIF is often paired with COALESCE to replace "empty" values with defaults.
Performance Considerations
COALESCE is extremely fast and evaluated inline per row.
However, avoid COALESCE on indexed columns in JOIN/WHERE/ORDER BY if possible, because:
It may disable index usage
It forces PostgreSQL to evaluate the expression for every row
Using COALESCE in SELECT is always safe and efficient.
The NULLIF Function in PostgreSQL
Introduction
NULLIF is a conditional function in PostgreSQL that returns NULL if two values are equal.
If the values are not equal, it returns the first value unchanged.
NULLIF is often used to avoid division-by-zero errors, normalize data, or convert meaningless values into actual NULLs.
NULLIF is frequently used together with COALESCE for advanced NULL-handling logic.
Syntax
NULLIF(value1, value2)
If value1 = value2, the result is NULL.
If value1 ≠ value2, the result is value1.
value1 is always the return value unless the values match exactly.
If any argument is NULL, the whole function returns NULL.
Use COALESCE to avoid NULL propagation:
SELECT LEAST(COALESCE(a, 0), COALESCE(b, 0)) FROM table;
Clamping Values (Setting Min/Max Boundaries)
LEAST and GREATEST can be combined to restrict a value within a range.
-- Clamp score to range 0–100
SELECT LEAST(100, GREATEST(score, 0)) AS clamped_score
FROM results;
Values < 0 become 0, values > 100 become 100.
This is a common technique for input validation.
Using LEAST / GREATEST for Conditional Logic
These functions often replace simple CASE expressions.
SELECT
salary,
GREATEST(salary, 2000) AS minimum_salary
FROM employees;
If salary is less than 2000, it becomes 2000.
Choosing the Earliest or Latest Timestamp
SELECT
LEAST(start_time, updated_at, created_at) AS earliest_timestamp,
GREATEST(start_time, updated_at, created_at) AS latest_timestamp
FROM logs;
Useful when records track multiple timestamps.
Comparing Columns Across Rows
SELECT
product,
LEAST(price_usd, price_eur, price_gbp) AS minimum_price,
GREATEST(price_usd, price_eur, price_gbp) AS maximum_price
FROM products;
Helps unify multiple currency columns or alternative pricing sources.
LEAST/GREATEST With Derived Expressions
SELECT
id,
LEAST(amount * 0.9, amount - 10) AS discounted_price
FROM orders;
You can freely mix arithmetic and functions within argument lists.
Using LEAST and GREATEST in ORDER BY
SELECT name
FROM employees
ORDER BY GREATEST(salary, bonus);
Sorting by "highest compensation component."
Be careful: expressions in ORDER BY may reduce index usage.
Performance Considerations
LEAST and GREATEST are extremely fast — evaluated per row, no special overhead.
However, using them in:
WHERE clauses
JOIN conditions
ORDER BY expressions
may prevent index-only scans because PostgreSQL must compute values for every row.
In SELECT lists, they are completely safe and efficient.
Details on the LIKE Operator in PostgreSQL
Introduction
LIKE is a PostgreSQL operator for pattern-based text comparison using wildcards.
It checks whether a string matches a specified pattern using:
% – matches zero or more characters
_ – matches exactly one character
LIKE is case-sensitive. PostgreSQL provides ILIKE for case-insensitive matching.
Syntax
<expression> LIKE <pattern>
<expression> NOT LIKE <pattern>
<expression> ILIKE <pattern>
<expression> NOT ILIKE <pattern>
expression is the text to match.
pattern is a string literal containing wildcards.
Returns TRUE, FALSE, or NULL (when either side is NULL).
Wildcard Rules: % and _
-- % wildcard (zero or more characters)
SELECT 'postgresql' LIKE 'post%'; -- TRUE
-- _ wildcard (exactly one character)
SELECT 'cat' LIKE 'c_t'; -- TRUE
% matches an arbitrary-length sequence of characters (including empty).
_ matches exactly one UTF-8 character – important when matching multi-byte Unicode characters.
Advanced: Matching Multi-byte Unicode Characters
PostgreSQL treats characters, not bytes. This means:
SELECT '你' LIKE '_'; -- TRUE (one Chinese character)
SELECT '你' LIKE '__'; -- FALSE
_ never splits a multi-byte UTF-8 character.
This makes LIKE safe for CJK languages.
Escaping Wildcards: The ESCAPE Clause
To match literal % or _, define an escape character.
SELECT '100% match' LIKE '100\% match' ESCAPE '\'; -- TRUE
SELECT 'file_1' LIKE 'file\_' ESCAPE '\'; -- TRUE
The default escape character is none – you must explicitly use ESCAPE.
Any character can be the escape character:
SELECT 'file_1' LIKE 'file#_' ESCAPE '#'; -- TRUE
Pattern scanning follows these rules:
<escape>% → literal percent
<escape>_ → literal underscore
<escape><escape> → literal escape
Trailing Spaces and LIKE
PostgreSQL does not ignore trailing spaces in LIKE patterns.
SELECT 'abc ' LIKE 'abc'; -- FALSE
SELECT 'abc ' LIKE 'abc%'; -- TRUE
Use RTRIM() when comparing padded strings.
Locale and Collation Impact
LIKE follows SQL-standard collation rules.
Collation affects character comparison but not wildcard behavior.
ILIKE uses PostgreSQL's Unicode-aware case folding.
Useful for languages with case variants (German: ß → ss behavior, Turkish dotted/dotless i, etc.).
LIKE and NULL
SELECT NULL LIKE '%'; -- NULL
SELECT 'abc' LIKE NULL; -- NULL
If either operand is NULL, the result is NULL (unknown).
Use COALESCE() when needed.
Using LIKE With ANY / ALL
SELECT *
FROM logs
WHERE message LIKE ANY (ARRAY['%error%', '%fail%', '%fatal%']);
LIKE vs SIMILAR TO vs Regular Expressions
LIKE – Simple wildcard patterns (fast, minimal syntax)
SIMILAR TO – SQL-standard regex-like operator (inconsistent, rarely used)
~, ~* – POSIX regular expressions (powerful)
-- LIKE (simple)
WHERE name LIKE 'A%';
-- SIMILAR TO (regex with SQL syntax)
WHERE name SIMILAR TO '(A|B)%';
-- POSIX regex (preferred for complex patterns)
WHERE name ~ '^(A|B).*';
Prefer LIKE for prefix searches.
Use regex for complex logic.
Avoid SIMILAR TO unless required for SQL compliance.
Index Usage Rules
B-tree indexes are used only for left-anchored patterns:
-- Index used
WHERE name LIKE 'abc%';
Not used for:
WHERE name LIKE '%abc';
WHERE name LIKE '%abc%';
WHERE name LIKE '_bc%'; -- cannot use index
WHERE LOWER(name) LIKE 'abc%'; -- no index unless functional index exists
ILIKEnever uses B-tree indexes unless pg_trgm is installed.
Using pg_trgm for Fast LIKE Searches
Install extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Add trigram index:
CREATE INDEX idx_users_name_trgm
ON users USING gin (name gin_trgm_ops);
This speeds up:
LIKE '%pattern%'
ILIKE '%pattern%'
Regex operations
Performance Considerations
LIKE 'abc%' is very fast with indexes.
LIKE '%abc' is slow, because it requires full scan.
Always avoid:
Leading wildcards
Patterns that require scanning all rows
Wrapping columns with LOWER() unless using functional index
Use pg_trgm for large text search workloads.
Practical Examples
-- Emails from Gmail
SELECT * FROM users WHERE email LIKE '%@gmail.com';
-- Names containing digits
SELECT * FROM users WHERE name LIKE '%_%' ESCAPE '\\';
-- Case-insensitive search
SELECT * FROM cities WHERE name ILIKE 'ber%';
-- Titles that start with a letter A–F
SELECT * FROM books WHERE title ~ '^[A-F]';
PostgreSQL Extensions
Introduction
A PostgreSQL extension is a package that adds extra functionality to the database server.
Extensions can contain:
SQL functions
Data types
Indexes and operator classes
Views, triggers, cast definitions, etc.
They are managed with commands like CREATE EXTENSION, ALTER EXTENSION, and DROP EXTENSION.
Available extension: its files are present on the server (in the PostgreSQL installation), but not yet enabled in the current database.
Installed (or "created") extension: registered in the current database via CREATE EXTENSION.
You can list all available extension packages with:
SELECT * FROM pg_available_extensions;
You can list all extensions currently installed in your database with:
SELECT * FROM pg_extension;
Creating an Extension (Enabling It in a Database)
CREATE EXTENSION pg_trgm;
This makes the pg_trgm functionality available in the current database.
Extensions are installed per database, not globally, so if you have multiple databases, you must run CREATE EXTENSION in each database where you need it.
You can specify a schema to hold the extension objects:
CREATE EXTENSION pg_trgm WITH SCHEMA extensions;
This is useful to keep extension objects separated from your own application schema.
Listing Extensions and Their Versions
-- Installed in current database
SELECT
extname,
extversion,
extnamespace::regnamespace AS schema_name
FROM
pg_extension
ORDER BY
extname;
-- Available (but maybe not installed)
SELECT
name,
default_version,
installed_version
FROM
pg_available_extensions
ORDER BY
name;
default_version is the version shipped with the server.
installed_version is NULL for extensions not yet created in the database.
Extension Files and Control Files (High-Level)
Each extension is described by:
A .control file – metadata: name, default version, required settings.
One or more .sql files – contain the SQL commands to create the extension objects.
These files live in the PostgreSQL installation directories (typically share/extension).
Database users normally do not touch these files directly, they just use CREATE EXTENSION.
Extension Schemas and search_path
When you run CREATE EXTENSION without WITH SCHEMA, the extension is created in the first schema on your search_path that is not a system schema.
To avoid cluttering public, many teams create a dedicated schema:
CREATE SCHEMA IF NOT EXISTS extensions;
CREATE EXTENSION IF NOT EXISTS hstore
WITH SCHEMA extensions;
When calling extension functions, you can:
Add the extension schema to search_path, or
Use fully qualified names, e.g. extensions.some_extension_function(...).
Updating Extensions (ALTER EXTENSION)
-- Update a single extension to latest version
ALTER EXTENSION pg_trgm UPDATE;
-- Update to a specific version (if supported)
ALTER EXTENSION some_ext UPDATE TO '1.2';
Removing Extensions (DROP EXTENSION)
DROP EXTENSION pg_trgm;
This removes all objects created by the extension from the current database.
If you have objects that depend on the extension (e.g., indexes using gin_trgm_ops), you might see dependency errors.
You can use CASCADE (careful!) to drop dependent objects:
DROP EXTENSION pg_trgm CASCADE;
Always verify what will be dropped before using CASCADE in production.
Common Built-In Extensions
pg_trgm – trigram indexes and similarity search, often used with LIKE / ILIKE.
hstore – key-value text storage in a single column (before JSONB was popular).
uuid-ossp – functions to generate UUIDs (uuid_generate_v4(), etc.).
citext – case-insensitive text type (e.g., for case-insensitive emails).
btree_gin, btree_gist – extra index operator classes.
postgres_fdw – foreign data wrapper for connecting to other PostgreSQL servers.
Extensions Requiring shared_preload_libraries
Some extensions hook deeper into the server and must be loaded at startup via shared_preload_libraries in postgresql.conf.
Examples:
pg_stat_statements (query statistics)
Some connection poolers, tracking or sampling modules
Typical flow:
-- In postgresql.conf (example)
shared_preload_libraries = 'pg_stat_statements'
-- After server restart, in each database where you want it:
CREATE EXTENSION pg_stat_statements;
You need file-level access to the server to change postgresql.conf, this is usually a DBA or admin task.
Permissions and Security
Creating or dropping an extension requires superuser or database owner privileges (depending on configuration).
Extensions can:
Run server-side C code
Access files and system resources
so installing untrusted third-party extensions can be a security risk.
Only install extensions that come from trusted sources or your platform provider.
Extensions and Backups
When you use pg_dump to export a database with extensions, the dump does not include the extension SQL source code.
Instead, the dump contains a CREATE EXTENSION command, assuming the extension is available on the target server.
Therefore:
Make sure the same extensions (and versions) are installed on the restore server.
Otherwise, restoring the dump may fail.
Checking Dependencies Between Extensions and Objects
To see which objects belong to an extension:
SELECT
objid::regclass,
classid::regclass
FROM pg_depend d
JOIN pg_extension e ON d.refobjid = e.oid
WHERE e.extname = 'pg_trgm';
This is advanced usage, mostly for DBAs who need to audit dependencies.
Practical Workflow Example
Example: enabling pg_trgm for fuzzy search on users(name):
-- 1. Create a dedicated schema for extensions (optional)
CREATE SCHEMA IF NOT EXISTS extensions;
-- 2. Install the pg_trgm extension into that schema
CREATE EXTENSION IF NOT EXISTS pg_trgm
WITH SCHEMA extensions;
-- 3. Create a trigram index for LIKE / ILIKE searches
CREATE INDEX idx_users_name_trgm
ON users
USING gin (name extensions.gin_trgm_ops);
-- 4. Now queries like this become much faster:
SELECT *
FROM users
WHERE name ILIKE '%wang%';
Creating Your Own PostgreSQL Extensions
Introduction
You can create your own PostgreSQL extensions to package functions, types, operators, and other database objects into a reusable module.
Extensions can be:
SQL-only – written purely in SQL and procedural languages (PL/pgSQL, PL/Perl, etc.).
C-based – contain compiled C code in a shared library (.so, .dll).
Once packaged as an extension, users only need CREATE EXTENSION my_ext; instead of running long SQL scripts manually.
High-Level Steps
Create a control file (my_ext.control) describing the extension.
Create one or more SQL script files (my_ext--1.0.sql, my_ext--1.0--1.1.sql, etc.).
(Optional) Build a shared library in C (my_ext.so) if you need low-level functions.
Install these files into PostgreSQL's share/extension (and lib for C code).
Enable the extension in a database using CREATE EXTENSION.
Directory Locations
To find where extension files should be placed, use pg_config on the server:
pg_config --sharedir # usually contains share/extension
pg_config --pkglibdir # usually contains shared libraries (.so/.dll)
SQL scripts and .control files go into $sharedir/extension.
Shared libraries go into $pkglibdir.
On managed services (cloud providers), you usually cannot install your own C extensions, but you can still create SQL-only extensions if they allow it.
Creating a Simple SQL-only Extension: Example
Suppose we want an extension hello_ext that adds a function hello(name).
-- File: hello_ext--1.0.sql
CREATE OR REPLACE FUNCTION hello(name text)
RETURNS text
LANGUAGE sql
AS
$$
SELECT 'Hello, ' || name || '!'::text;
$$;
This SQL script defines the objects that will be created when the extension is installed.
The file name must follow the pattern <extname>--<version>.sql, here hello_ext--1.0.sql.
Writing the Control File (.control)
Create a file named hello_ext.control in share/extension:
-- File: hello_ext--1.0--1.1.sql
-- Migration from 1.0 to 1.1
CREATE OR REPLACE FUNCTION hello_upper(name text)
RETURNS text
LANGUAGE sql
AS
$$
SELECT upper(hello(name));
$$;
To upgrade an existing installation:
ALTER EXTENSION hello_ext UPDATE TO '1.1';
PostgreSQL runs hello_ext--1.0--1.1.sql automatically when upgrading from version 1.0.
You can chain multiple upgrade scripts:
hello_ext--1.0--1.1.sql
hello_ext--1.1--1.2.sql
...
C-based Extensions (Overview Only)
For performance-sensitive logic or low-level features, you can write C functions and expose them as an extension.
Regex cannot use B-tree indexes (except for fixed-prefix anchors ^text).
For fast substring / fuzzy search, use pg_trgm extension:
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm
ON users USING gin (name gin_trgm_ops);
SELECT * FROM users WHERE name ~ 'wang';
Trigram indexes accelerate many regex operations automatically.
Regex vs LIKE vs SIMILAR TO
Feature
LIKE
SIMILAR TO
Regex (~)
Wildcard power
Low
Medium
High
Performs complex validation
No
Somewhat
Yes
Index support
Only prefix
No
No (except pg_trgm)
Syntax complexity
Very easy
Medium
High
Regex is the most powerful but most expensive.
The SIMILAR TO Operator in PostgreSQL
Introduction
SIMILAR TO is a SQL-standard pattern matching operator that combines aspects of LIKE and regular expressions.
It is more powerful than LIKE but less powerful (and less flexible) than PostgreSQL's regex operators (~, ~*).
SIMILAR TO uses a regex-like syntax but still follows SQL-standard rules, not POSIX rules.
It is rarely used in real-world PostgreSQL development because regex operators are more intuitive and consistent.
Basic Syntax
value SIMILAR TO pattern
value NOT SIMILAR TO pattern
Returns TRUE if value matches the SQL-standard regex-like pattern.
Returns FALSE otherwise.
Returns NULL if either operand is NULL.
Fundamental Rules of SIMILAR TO
% – matches zero or more characters (like LIKE).
_ – matches exactly one character.
| – regex alternation operator.
() – grouping for alternation.
[] – character classes.
ESCAPE clause can be used to escape metacharacters.
SELECT 'cat' SIMILAR TO '(cat|dog)'; -- TRUE
SELECT 'dog' SIMILAR TO '(cat|dog)'; -- TRUE
SELECT 'frog' SIMILAR TO '(cat|dog)'; -- FALSE
SIMILAR TO vs LIKE
Feature
LIKE
SIMILAR TO
Alternation
No
Yes (|)
Grouping
No
Yes
Character classes
No
Yes
Complex validation
No
Possible
Case sensitivity
Sensitive
Sensitive only (no ILIKE equivalent)
In short: SIMILAR TO = LIKE + basic regex.
Alternation (|) and Grouping
SELECT 'apple' SIMILAR TO '(apple|banana|cherry)'; -- TRUE
SELECT 'guava' SIMILAR TO '(apple|banana|cherry)'; -- FALSE
Grouping with parentheses is required for alternation.
Character Classes
SELECT 'a' SIMILAR TO '[abc]'; -- TRUE
SELECT 'd' SIMILAR TO '[abc]'; -- FALSE
SELECT '5' SIMILAR TO '[0-9]'; -- TRUE
Similar to regex classes, but with SQL-standard semantics.
Wildcard Behavior
% = zero or more characters.
_ = exactly one character.
SELECT 'hello' SIMILAR TO 'h%'; -- TRUE
SELECT 'hello' SIMILAR TO 'h_llo'; -- TRUE
Anchoring Rules (Implicit Anchors)
Unlike POSIX regex, SIMILAR TO patterns act like the entire string is implicitly wrapped with ^ ... $.
Meaning: SIMILAR TO always matches the entire string. Partial matches are not allowed unless you use %.
SELECT 'hello' SIMILAR TO 'ell'; -- FALSE (must match whole string)
SELECT 'hello' SIMILAR TO '%ell%'; -- TRUE
Complex Validation Example
SELECT '2024-10-15' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}'; -- TRUE
SELECT '2024/10/15' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}'; -- FALSE
SIMILAR TO supports repetition with curly braces too.
Using SIMILAR TO With NOT
SELECT 'abc' NOT SIMILAR TO '[0-9]+'; -- TRUE
Behaves similarly to NOT LIKE and !~.
Escaping Special Characters
Special characters that require escaping:
%, _ (as in LIKE)
( )
|
[ ]
{ }
SELECT 'a+b' SIMILAR TO 'a\+b'; -- match literal +
You can specify a custom escape character:
SELECT 'a+b' SIMILAR TO 'a#+' ESCAPE '#';
SIMILAR TO is Fully Anchored
Unlike regular expressions, SIMILAR TO matches the entire string unless % or _ is used.
SELECT 'abc' SIMILAR TO 'a.*'; -- FALSE
SELECT 'abc' SIMILAR TO 'a%'; -- TRUE
This difference surprises many users!
Combining Conditions
SELECT *
FROM logs
WHERE message SIMILAR TO '%(ERROR|WARN|FATAL)%';
Good for basic multi-pattern filtering.
SIMILAR TO vs Regex Operators (~)
Feature
SIMILAR TO
Regex (~)
Power level
Medium
High
Performance
Medium
Lower (heavier)
Anchored matching
Always full string
Optional
Unicode support
Limited
Full POSIX regex support
Syntax
SQL-standard
POSIX regex syntax
Alternation
Yes
Yes
Complex patterns
Limited
Very strong
Regex (~) is almost always preferred for non-trivial pattern matching.
Performance Considerations
SIMILAR TO does not use indexes unless the pattern starts with a fixed prefix.
Like regex and LIKE with wildcards, it can be slow on large text columns.
To optimize substring search, use pg_trgm:
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_logs_msg_trgm
ON logs USING gin (message gin_trgm_ops);
SELECT * FROM logs
WHERE message SIMILAR TO '%fail%';
Trigram indexes dramatically improve performance.
Practical Examples
-- Validate YYYY-MM-DD
SELECT '2024-12-31' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}';
-- Match emails ending in .com or .net
SELECT 'test@example.com' SIMILAR TO '%@(.*)\.(com|net)';
-- Validate integer or decimal
SELECT '123.45' SIMILAR TO '[0-9]+(\.[0-9]+)?';
-- Match strings starting with A–F
SELECT 'Apple' SIMILAR TO '[A-F]%';
Use SIMILAR TO for SQL-standard pattern matching where full regex is not required.
Basics on Full-Text Search (FTS) in PostgreSQL
Introduction
PostgreSQL provides a powerful, built-in full-text search (FTS) system for indexing and searching natural language text.
FTS understands linguistic structures:
tokenization
stemming
stop-words
ranking of results
It works with special data types such as tsvector and tsquery.
FTS is more advanced than LIKE, ILIKE, or regex because it understands word forms and relevance.
Core Concepts
tsvector – preprocessed document, normalized into lexemes.
tsquery – a structured query for searching tsvector documents.
@@ – match operator between tsvector and tsquery.
@@@ – phrase search (PostgreSQL ≥ 15).
SELECT 'a fat cat sat on a mat'::tsvector;
-- 'a' 'cat' 'fat' 'mat' 'on' 'sat'
SELECT to_tsvector('English', 'The cars are running');
-- 'car':2 'run':4
tsvector: The Document
A tsvector is the indexed representation of a text string.
It contains:
lexemes (normalized words)
positions
weights (optional)
SELECT to_tsvector('English', 'Programming program programs programmed');
-- 'programm':1,3,4,5
PostgreSQL reduces all of these to the lexeme 'programm' via stemming.
SELECT
to_tsvector('english', 'The big car crashed'),
to_tsquery('english', 'car');
To check if text matches a query:
SELECT
to_tsvector('english', 'The big car crashed')
@@ to_tsquery('english', 'car'); -- TRUE
Phrase Search (word adjacency)
SELECT to_tsquery('english', 'big <-> car');
big <-> car means "big immediately before car".
SELECT to_tsvector('The big car crashed') @@
to_tsquery('big <-> car'); -- TRUE
Phrase Search (exact phrase) in PostgreSQL ≥ 15
New phrase search operator @@@:
SELECT to_tsvector('The big red car') @@@ 'big red car';
-- TRUE
Equivalent to a multi-word adjacency chain.
Indexing Full-Text Search
Use a GIN index for fast text search:
CREATE INDEX idx_docs_fts
ON documents
USING gin (to_tsvector('english', content));
But then queries must repeat the same expression:
SELECT *
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'car');
Storing tsvector in a Column
You can store tsvector in a dedicated column and update it automatically:
ALTER TABLE documents
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX idx_docs_fts ON documents USING gin (fts);
This avoids recomputing the vector on every query.
Composing Queries with plainto_tsquery and websearch_to_tsquery
plainto_tsquery – converts plain text into a query using AND logic.
ALTER TABLE blog_posts
ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'D')
) STORED;
CREATE INDEX idx_blog_fts ON blog_posts USING gin (fts);
SELECT
id,
ts_headline(body, q) AS preview,
ts_rank(fts, q) AS rank
FROM blog_posts,
to_tsquery('english', 'postgres & search') q
WHERE fts @@ q
ORDER BY rank DESC;
User-Defined Functions (UDFs) in PostgreSQL
Introduction
PostgreSQL allows you to create your own functions using SQL, PL/pgSQL, PL/Python, PL/Perl, PL/Java, and even C.
User-defined functions (UDFs) are first-class citizens: they can be used in
SELECT, JOIN, WHERE, triggers, indexes, and extensions.
Functions can return:
scalar values (INT, TEXT, JSON, etc.)
composite types (ROW)
TABLE results (set-returning)
SETOF anytype
void
PostgreSQL optimizes SQL functions aggressively and often inlines them, making them very fast.
Basic SQL Functions
SQL functions are simple wrappers around SQL expressions.
CREATE OR REPLACE FUNCTION add_numbers(a INT, b INT)
RETURNS INT
LANGUAGE sql
AS
$$
SELECT a + b;
$$;
Executed as:
SELECT add_numbers(10, 20); -- 30
Functions Returning NULL, STRICT Functions
STRICT means the function returns NULL immediately if any argument is NULL.
Equivalent to manually checking for NULL.
CREATE FUNCTION multiply(a INT, b INT)
RETURNS INT
LANGUAGE sql
STRICT
AS
$$
SELECT a * b;
$$;
Now:
SELECT multiply(NULL, 5); -- NULL
IMMUTABLE, STABLE, VOLATILE
These declarations tell PostgreSQL how predictable your function is:
Category
Description
Examples
IMMUTABLE
Always returns the same output for the same input
pure math, string functions
STABLE
Same result within a single query, but may change between queries
now(), current_setting()
VOLATILE
Can change anytime, cannot be optimized
random(), sequences, updates
CREATE FUNCTION greet(name TEXT)
RETURNS TEXT
LANGUAGE sql
IMMUTABLE
AS
$$
SELECT 'Hello ' || name;
$$;
PL/pgSQL Functions (Procedural Language)
PL/pgSQL is PostgreSQL's powerful procedural language: loops, variables, conditions, exceptions.
CREATE OR REPLACE FUNCTION factorial(n INT)
RETURNS INT
LANGUAGE plpgsql
AS
$$
DECLARE
result INT := 1;
BEGIN
IF n < 0 THEN
RAISE EXCEPTION 'Negative numbers not allowed';
END IF;
FOR i IN 1..n LOOP
result := result * i;
END LOOP;
RETURN result;
END;
$$;
SELECT factorial(5); -- 120
Using Variables, BEGIN...END Blocks
CREATE FUNCTION format_user(first TEXT, last TEXT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
DECLARE
full TEXT;
BEGIN
full := initcap(first || ' ' || last);
RETURN full;
END;
$$;
Control Structures (IF, CASE, LOOP)
CREATE FUNCTION classify(score INT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
BEGIN
IF score >= 90 THEN
RETURN 'A';
ELSIF score >= 80 THEN
RETURN 'B';
ELSE
RETURN 'C';
END IF;
END;
$$;
RETURN QUERY for Set-Returning Functions
You can return multiple rows (like a table).
CREATE FUNCTION get_users_with_gmail()
RETURNS TABLE(id INT, email TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN QUERY
SELECT id, email FROM users WHERE email LIKE '%@gmail.com';
END;
$$;
SELECT * FROM get_users_with_gmail();
Returning Custom Composite Types
CREATE TYPE user_info AS (
id INT,
name TEXT,
created TIMESTAMP
);
CREATE FUNCTION get_user(uid INT)
RETURNS user_info
LANGUAGE sql
AS
$$
SELECT id, name, created_at
FROM users
WHERE id = uid;
$$;
Functions Returning JSON or JSONB
CREATE FUNCTION product_json(pid INT)
RETURNS JSONB
LANGUAGE sql
AS
$$
SELECT jsonb_build_object(
'id', id,
'name', name,
'price', price
)
FROM products
WHERE id = pid;
$$;
Security Definer Functions
SECURITY DEFINER makes the function run with the privileges of the creator, not the caller.
Useful for controlled data access.
Dangerous if misused – always restrict the search_path!
CREATE FUNCTION read_logs()
RETURNS SETOF logs
LANGUAGE sql
SECURITY DEFINER
SET search_path = public
AS
$$
SELECT * FROM logs;
$$;
Managing search_path Safely
To avoid SQL injection in SECURITY DEFINER functions, always set search_path explicitly:
ALTER FUNCTION read_logs() SET search_path = public;
Error Handling and Exception Blocks
CREATE OR REPLACE FUNCTION safe_div(a NUMERIC, b NUMERIC)
RETURNS NUMERIC
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN a / b;
EXCEPTION
WHEN division_by_zero THEN
RETURN NULL;
END;
$$;
PostgreSQL supports function overloading based on argument types.
CREATE FUNCTION info(x INT) RETURNS TEXT AS
$$ SELECT 'int' $$ LANGUAGE sql;
CREATE FUNCTION info(x TEXT) RETURNS TEXT AS
$$ SELECT 'text' $$ LANGUAGE sql;
SELECT info(42); -- 'int'
SELECT info('foo'); -- 'text'
C-Language Functions (Overview)
C functions provide maximum performance and flexibility.
They are compiled into .so libraries and loaded by PostgreSQL.
CREATE FUNCTION add_one(int)
RETURNS int
AS 'myext', 'add_one'
LANGUAGE c STRICT;
Dropping and Replacing Functions
DROP FUNCTION add_numbers(INT, INT);
If signature changes, you must drop or use OR REPLACE.
Listing All User-Defined Functions
SELECT
proname,
pg_get_functiondef(p.oid)
FROM pg_proc p
JOIN pg_namespace n ON p.pronamespace = n.oid
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema');
Practical Real-World UDF Examples
-- Normalize email
CREATE FUNCTION clean_email(e TEXT)
RETURNS TEXT
IMMUTABLE
LANGUAGE sql
AS
$$ SELECT lower(trim(e)) $$;
-- Generate URL slug
CREATE FUNCTION slugify(s TEXT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN regexp_replace(lower(s), '[^a-z0-9]+', '-', 'g');
END;
$$;
-- Increase stock safely
CREATE FUNCTION add_stock(pid INT, amount INT)
RETURNS VOID
LANGUAGE plpgsql
AS
$$
BEGIN
UPDATE products
SET stock = stock + amount
WHERE id = pid;
END;
$$;
Stored Procedures
Introduction
PostgreSQL distinguishes between functions and stored procedures.
Procedures were introduced in PostgreSQL 11 and allow:
transaction control inside procedure body (COMMIT, ROLLBACK),
autonomous units of work,
calls via CALL instead of SELECT.
Procedures do not return values (no RETURNS clause).
They are ideal for batch jobs, maintenance tasks, cleanup routines, ETL steps, or anything requiring transaction boundaries.
Basic Syntax
CREATE PROCEDURE procedure_name(arg1 type1, arg2 type2)
LANGUAGE plpgsql
AS
$$
BEGIN
-- logic
END;
$$;
No RETURNS clause is allowed.
Execution uses CALL:
CALL procedure_name(value1, value2);
Core Differences Between PROCEDURE and FUNCTION
Feature
FUNCTION
PROCEDURE
Returns values
Yes (scalar, composite, set)
No
Executed with
SELECT or PERFORM
CALL
Transaction control
No
Yes(COMMIT / ROLLBACK)
Parallel safety
Declarative (IMMUTABLE / STABLE / VOLATILE)
Not applicable
Use cases
Reusable expressions, business logic
Batch jobs, maintenance tasks, ETL
Simple Stored Procedure Example
CREATE OR REPLACE PROCEDURE log_message(msg TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
INSERT INTO logs(message, created_at)
VALUES (msg, NOW());
END;
$$;
CALL log_message('Hello from procedure!');
Side effects (INSERT / UPDATE / DELETE) are typically the goal of a procedure.
Procedures With Transaction Control
This is the biggest reason to use procedures.
You can COMMIT / ROLLBACK inside the procedure body.
CREATE OR REPLACE PROCEDURE batch_insert()
LANGUAGE plpgsql
AS
$$
BEGIN
INSERT INTO events(data) VALUES ('step 1');
COMMIT; -- commit early
INSERT INTO events(data) VALUES ('step 2');
ROLLBACK; -- rollback this part
INSERT INTO events(data) VALUES ('step 3');
END;
$$;
CALL batch_insert();
NOTE: Only top-levelCALL allows transaction control.
Procedures cannot be called from inside a function with COMMIT / ROLLBACK.
Using IN, OUT, INOUT Parameters
Procedures may return data via OUT or INOUT parameters.
This is the only way they can output values.
CREATE PROCEDURE add_and_double(
IN a INT,
IN b INT,
OUT sum_val INT,
OUT double_sum INT
)
LANGUAGE plpgsql
AS
$$
BEGIN
sum_val := a + b;
double_sum := (a + b) * 2;
END;
$$;
CALL add_and_double(3, 5, NULL, NULL);
Client tools will show OUT values as a result set.
Using INOUT Parameters
CREATE PROCEDURE increment(INOUT x INT)
LANGUAGE plpgsql
AS
$$
BEGIN
x := x + 1;
END;
$$;
CALL increment(10); -- returns 11
Procedures With Dynamic SQL (EXECUTE)
Useful for administrative tasks, partition management, dynamic operations.
CREATE PROCEDURE drop_table_if_exists(tname TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(tname);
END;
$$;
CALL drop_table_if_exists('old_data');
Procedures and Exception Handling
Procedures support full PL/pgSQL exception blocks.
CREATE PROCEDURE safe_insert(val INT)
LANGUAGE plpgsql
AS
$$
BEGIN
INSERT INTO numbers(num) VALUES (val);
EXCEPTION
WHEN unique_violation THEN
RAISE NOTICE 'Duplicate ignored: %', val;
END;
$$;
CALL safe_insert(10);
Procedures Calling Other Procedures
CREATE PROCEDURE outer_proc()
LANGUAGE plpgsql
AS
$$
BEGIN
CALL inner_proc();
END;
$$;
Nested CALLs are allowed.
But only the top-level CALL controls transactions.
Procedures vs DO Blocks
Feature
PROCEDURE
DO Block
Persistence
Stored permanently
Temporary (inline execution)
Transaction control
Yes
No
Parameters
Yes
No
Used in extensions
Yes
No
Replacing and Dropping Procedures
DROP PROCEDURE log_message(TEXT);
CREATE OR REPLACE PROCEDURE log_message(TEXT) ...
PostgreSQL identifies procedures by name + argument types.
Listing All Procedures
SELECT
p.proname,
pg_get_function_arguments(p.oid) AS args,
n.nspname AS schema
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE p.prokind = 'p'; -- 'p' = procedure
Security Considerations
SECURITY DEFINER works for procedures too.
Always set search_path explicitly to avoid SQL injection:
ALTER PROCEDURE log_message(TEXT)
SET search_path = public;
Be cautious with dynamic SQL inside SECURITY DEFINER procedures.
Real-World Example: ETL Batch Job
CREATE OR REPLACE PROCEDURE nightly_etl()
LANGUAGE plpgsql
AS
$$
BEGIN
-- Step 1: staging load
INSERT INTO staging SELECT * FROM raw_data;
COMMIT;
-- Step 2: transformations
UPDATE staging SET cleaned = TRUE WHERE data LIKE '%ok%';
COMMIT;
-- Step 3: merge into production
INSERT INTO production SELECT * FROM staging;
COMMIT;
-- Step 4: cleanup
TRUNCATE staging;
END;
$$;
CALL nightly_etl();
Each ETL phase is committed independently.
Impossible to implement with functions.
Transactions, COMMIT, and ROLLBACK in PostgreSQL
Introduction
A transaction in PostgreSQL is a group of SQL statements executed as a single unit of work.
A transaction guarantees the 4 ACID properties:
Atomicity – all-or-nothing
Consistency – no broken constraints
Isolation – one transaction doesn't see another's partial changes
Durability – committed data is guaranteed to persist
You control a transaction with two commands:
COMMIT – permanently save changes
ROLLBACK – undo changes since the last BEGIN
Transactions isolate incomplete work from other users until finalized.
Starting a Transaction
BEGIN;
-- or: START TRANSACTION;
All subsequent SQL statements become part of this transaction until explicitly ended.
COMMIT: Saving All Changes
COMMIT finalizes the transaction and permanently writes changes to the database.
Once committed, changes cannot be undone (unless you start a new compensating transaction).
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- both updates permanently saved
ROLLBACK: Undoing All Changes
ROLLBACK cancels the entire transaction and reverts all statements back to the state before BEGIN.
No data is modified if rollback occurs.
BEGIN;
UPDATE products SET stock = stock - 10 WHERE id = 100;
UPDATE products SET stock = stock + 10 WHERE id = 200;
ROLLBACK; -- both updates undone
Even if multiple statements executed successfully, ROLLBACK erases all of them from the current transaction.
Autocommit Behavior
By default, PostgreSQL runs in autocommit mode:
each individual statement is its own transaction
and is automatically committed after execution
Example:
UPDATE users SET name = 'A'; -- auto-committed immediately
To use multi-statement transactions, you must explicitly begin them using BEGIN.
Combining Multiple Operations
Transactions allow complex multi-step logic to behave atomically.
BEGIN;
INSERT INTO orders(user_id, total) VALUES (7, 250);
INSERT INTO order_items(order_id, product_id, qty)
VALUES (currval('orders_id_seq'), 42, 3);
-- If anything goes wrong:
-- ROLLBACK;
COMMIT;
Either bothINSERTs happen or neither.
Savepoints: Partial Rollback
Savepoints allow you to roll back part of a transaction without canceling the whole thing.
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
SAVEPOINT s1;
UPDATE accounts SET balance = balance + 500 WHERE id = 99; -- account 99 doesn't exist
ROLLBACK TO s1; -- undo only second update
COMMIT;
ROLLBACK TO does not end the transaction, only rewinds to that savepoint.
Transaction Behavior in Functions
Functions (CREATE FUNCTION) cannot use COMMIT or ROLLBACK.
They run inside the caller's transaction boundary.
-- INVALID inside functions:
COMMIT;
ROLLBACK;
Functions must behave atomically.
Transaction Behavior in Procedures
Procedures (CREATE PROCEDURE) can use transaction control.
They are invoked with CALL at the top level.
This is the main difference from functions.
CREATE PROCEDURE demo()
LANGUAGE plpgsql
AS
$$
BEGIN
INSERT INTO logs(msg) VALUES ('step 1');
COMMIT; -- allowed
INSERT INTO logs(msg) VALUES ('step 2');
ROLLBACK; -- undo step 2 only
INSERT INTO logs(msg) VALUES ('step 3'); -- part of new transaction
END;
$$;
CALL demo();
Transaction Isolation Levels
Transaction isolation levels define what data a transaction can see while other transactions are running concurrently.
PostgreSQL implements isolation using MVCC (Multi-Version Concurrency Control), which gives each transaction a consistent snapshot of the database.
The isolation level determines:
whether the transaction sees other transactions' updates,
whether rows can appear or disappear during the transaction,
whether reading the same row twice always yields the same value,
whether anomalies (dirty reads, write skew, phantom rows) are allowed.
You set the isolation level per transaction:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- work inside transaction
COMMIT;
PostgreSQL supports three standard isolation levels:
Isolation Level
Description (Detailed)
READ COMMITTED
This is the default isolation level in PostgreSQL.
Each individual SQL statement sees a snapshot of data committed before that statement began.
If another transaction commits in-between your statements, your next statement will see the new data.
Prevents dirty reads (you never see uncommitted data).
Allows non-repeatable reads:
The same row queried twice may return different values if another transaction modifies and commits it between your statements.
Allows phantom reads:
Repeated SELECT queries may return more or fewer rows.
Snapshot: one snapshot per statement.
REPEATABLE READ
The transaction sees a single consistent snapshot taken when the transaction begins.
No matter what other transactions commit during your transaction, your reads stay the same.
Prevents:
dirty reads: never see uncommitted data
non-repeatable reads: reading the same row twice always gives the same result
Allows phantom reads:
A repeated query on ranges may still return additional rows depending on the situation.
May allow write skew anomalies (a type of race condition) unless SERIALIZABLE is used.
Snapshot: one snapshot for the entire transaction.
SERIALIZABLE
The strongest isolation level — ensures the database behaves as if transactions were executed one by one, sequentially, even though they run concurrently.
Automatically prevents:
dirty reads
non-repeatable reads
phantom reads
write skew anomalies
Requires PostgreSQL to automatically detect dangerous conflicts.
If two concurrent transactions could lead to a serialization anomaly, PostgreSQL aborts one with:
ERROR: could not serialize access due to read/write dependencies
Your application must retry the entire transaction.
Snapshot: similar to REPEATABLE READ, but with extra checks using predicate locks to avoid anomalies.
What Happens During Errors?
Any unhandled SQL error causes the current transaction to become aborted.
You cannot continue until you ROLLBACK.
BEGIN;
INSERT INTO a VALUES (1);
INSERT INTO a VALUES (1); -- duplicate key error, transaction aborted
-- Now ANY statement will error until ROLLBACK is issued
ROLLBACK; -- reset state
Real-World Example: Bank Transfer
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT; -- transfer completed
Guarantees that money is never lost or created.
Real-World Example: Repairing Data
BEGIN;
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Oops, wrong condition?
ROLLBACK;
-- try again
BEGIN;
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days';
COMMIT;
Sequences in PostgreSQL
A sequence in PostgreSQL is a special object that generates a sequence of numeric values, usually used for auto-incrementing primary keys.
Sequences are independent database objects and not tied to a table unless you explicitly connect them.
They are non-transactional counters:
Each call increases the counter immediately
Sequence increments cannot be rolled back
PostgreSQL automatically creates sequences for SERIAL, BIGSERIAL, and GENERATED AS IDENTITY columns.
Creating a Sequence
CREATE SEQUENCE user_id_seq;
A minimal sequence starts at 1 and increments by 1.
You can customize start, increment, min, max, cache, and cycle behavior.
CREATE SEQUENCE order_seq
START WITH 1000
INCREMENT BY 5
MINVALUE 1
MAXVALUE 1000000
CACHE 50
CYCLE;
CYCLE: sequence wraps to MINVALUE when MAXVALUE reached.
-- Transaction A and B both run:
SELECT nextval('user_id_seq');
-- A gets 12
-- B gets 13
Resetting or Restarting a Sequence
ALTER SEQUENCE user_id_seq RESTART WITH 1;
You can also use setval:
SELECT setval('user_id_seq', 1, false);
NOTE: resetting sequences can break primary key uniqueness if old IDs still exist.
Dropping a Sequence
DROP SEQUENCE user_id_seq;
If it was OWNED BY a column, dropping the table or column also drops the sequence.
Real-World Examples
-- Create a custom global invoice numbering generator
CREATE SEQUENCE invoice_seq START WITH 20240001;
INSERT INTO invoices(id, total)
VALUES (nextval('invoice_seq'), 99.99);
-- Use sequence to shard IDs by region
CREATE SEQUENCE region1_seq START 1 INCREMENT 10;
CREATE SEQUENCE region2_seq START 2 INCREMENT 10;
-- Regenerating sequence after manual data load
SELECT setval('user_id_seq', (SELECT MAX(id) FROM users));
Indexes in PostgreSQL
Introduction
An index in PostgreSQL is a separate data structure used to accelerate row lookups, filtering, sorting, joining, and uniqueness enforcement.
Indexes speed up WHERE clauses, JOIN conditions, ORDER BY, GROUP BY, and distinct queries.
Indexes are not free: maintaining them adds CPU, I/O, and storage overhead on INSERT / UPDATE / DELETE.
PostgreSQL maintains indexes automatically as data changes.
Basic Index Creation
CREATE INDEX idx_users_email
ON users(email);
Creates a default B-tree index.
PostgreSQL chooses B-tree unless you explicitly specify another type.
The Main Index Types
Index Type
Best For
Notes
B-tree
Equality (=) and range (<, >) queries
Default index, supports sorting and UNIQUE constraints
Hash
Equality lookups only
Rarely used, B-tree is usually faster
GIN
Arrays, JSONB, full-text search
Multivalued indexing, slower to update but very fast lookup
GiST
Geometric types, full-text search, ranges
Flexible balanced tree framework
SP-GiST
Partitioned search spaces: IP ranges, quadtrees
Specialized high-performance search structures
BRIN
Very large tables with natural ordering (timestamps, IDs)
Tiny index size, block ranges, extremely fast to build
Bloom extension
Many-column indexes
Probabilistic index; extension: pg_bloom
B-Tree Indexes (Default)
Suitable for equality and range operations:
WHERE email = 'a@example.com'
WHERE age > 18
WHERE price BETWEEN 100 AND 200
ORDER BY created_at DESC
They support unique constraints:
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
Expression Indexes (Function Indexes)
Indexes over expressions let PostgreSQL index computed values.
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
Query must match the expression:
SELECT * FROM users WHERE LOWER(email) = 'a@example.com';
Partial Indexes (Filtered Indexes)
Indexes only part of a table for performance and space savings.
CREATE INDEX idx_active_users
ON users(email)
WHERE active = true;
Useful when most rows do not match the condition.
Example: indexing only WHERE deleted_at IS NULL.
Multi-Column Indexes
Indexes can contain multiple columns.
Column order matters a lot.
CREATE INDEX idx_users_last_first
ON users(last_name, first_name);
Index can be used for:
WHERE last_name = ... AND first_name = ...
WHERE last_name = ...
But NOT for:
WHERE first_name = ...
Index-Only Scans
If all required columns exist in the index, PostgreSQL can avoid accessing the table entirely.
This is faster because:
index pages are smaller than table pages
no heap tuple visibility checks needed if page is "all-visible"
CREATE INDEX idx_users_email_only
ON users(email);
SELECT email FROM users WHERE email LIKE '%@gmail.com';
Requires visibility map bits to be set — VACUUM affects index-only scan performance.
GIN Indexes
Designed for multivalued data:
Array columns
JSONB documents
Full-text search
Supports containment operators:
CREATE INDEX idx_tags_gin
ON posts USING gin(tags);
Examples:
SELECT * FROM posts WHERE tags && '{sql}';
SELECT * FROM posts WHERE tags @> '{programming}';
SELECT * FROM data WHERE jsonb_data @> '{"user": {"id": 5}}';
GiST Indexes
Generalized Search Tree — supports many non-linear data types:
geometric types
ranges
full-text search
ltree hierarchical search
CREATE INDEX idx_ranges_gist
ON events USING gist(period);
SP-GiST Indexes
Used for partitioned search spaces such as:
IP addresses (radix trees)
geometric quadtrees
trie-like structures
CREATE INDEX idx_ip_spgist
ON ips USING spgist(ip_address);
BRIN Indexes
Block Range Index — extremely tiny and fast to build.
Best for very large tables where data is naturally ordered:
timestamps
auto-incrementing IDs
Index stores summary values for each 8k block range.
CREATE INDEX idx_logs_brin
ON logs USING brin(timestamp);
Index Bloat
PostgreSQL never updates index rows in-place.
Updates create new index entries.
Dead entries accumulate until vacuumed.
B-tree indexes are most sensitive to bloat.
REINDEX INDEX idx_users_email;
REINDEX TABLE users;
Fillfactor
Defines how full index pages may become.
Useful for high-update workloads.
CREATE INDEX idx_items_price
ON items(price)
WITH (fillfactor = 70);
NULL Handling in Indexes
B-tree indexes store NULLs and sort them first or last depending on ASC/DESC.
Use partial indexes to exclude NULLs:
CREATE INDEX idx_not_null_price
ON items(price)
WHERE price IS NOT NULL;
Collation-Aware Indexing
Index sorting depends on collation.
CREATE INDEX idx_names_collation
ON people(name COLLATE "de_DE");
Operator Classes
Index behavior changes depending on operator classes.
Example: text pattern ops optimize LIKE 'abc%'.
CREATE INDEX idx_users_email_pattern
ON users(email text_pattern_ops);
Covering Indexes (INCLUDE)
Store extra columns in the index for index-only scans, without affecting index sorting.
CREATE INDEX idx_orders_user_date
ON orders(user_id)
INCLUDE (order_date);
Dropping Indexes
DROP INDEX idx_users_email;
Real-World Index Examples
-- Search by email, case-insensitive
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
-- Fast soft-deletes
CREATE INDEX idx_posts_not_deleted
ON posts(id)
WHERE deleted_at IS NULL;
-- Fast JSONB lookup
CREATE INDEX idx_events_jsonb
ON events USING gin(data);
-- Speed up sorting queries
CREATE INDEX idx_logs_ts_desc
ON logs(timestamp DESC);
-- Range queries on large time-based table
CREATE INDEX idx_sensor_brin
ON sensor_data USING brin(recorded_at);
Composite Types
Introduction
A composite type in PostgreSQL is a user-defined structured type, similar to a "record" or "struct".
Composite types allow grouping multiple fields into a single logical type.
They can be used:
as column types,
as function return types,
inside arrays, JSON, or other nested structures,
with row operations (ROW(...)),
with PL/pgSQL records and variables.
Composite types behave like anonymous rows (ROW(...)) but are reusable and schema-bound.
Creating a Composite Type
CREATE TYPE address AS (
street TEXT,
city TEXT,
postal_code TEXT,
country TEXT
);
This defines a reusable structured type named address.
You can use it anywhere a normal type is allowed.
Using Composite Types as Table Columns
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
addr address
);
Insert using ROW(...) or a literal record:
INSERT INTO users(name, addr)
VALUES (
'Alice',
ROW('Main Street 1', 'Berlin', '10115', 'Germany')
);
ALTER TYPE address ADD ATTRIBUTE province TEXT;
ALTER TYPE address DROP ATTRIBUTE postal_code;
ALTER TYPE address RENAME ATTRIBUTE street TO street_name;
Composite types can be modified safely, except when used in stored functions.
Dropping Composite Types
DROP TYPE address;
Fails if dependent tables, functions, or types still reference it.
Use CASCADE with extreme caution.
Real-World Examples
-- Geographic point stored as composite
CREATE TYPE geo_point AS (
lat DOUBLE PRECISION,
lon DOUBLE PRECISION
);
CREATE TABLE events (
id SERIAL PRIMARY KEY,
title TEXT,
pos geo_point
);
INSERT INTO events(title, pos)
VALUES ('Concert', ROW(52.5200, 13.4050));
-- Composite type as function OUT parameters
CREATE TYPE product_summary AS (
product_id INT,
name TEXT,
price NUMERIC
);
CREATE FUNCTION summarize_product(pid INT)
RETURNS product_summary
LANGUAGE sql
AS
$$
SELECT id, name, price FROM products WHERE id = pid;
$$;
-- Composite type arrays for time-series sensors
CREATE TYPE measurement AS (
ts TIMESTAMP,
temp NUMERIC,
hum NUMERIC
);
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
logs measurement[]
);
Arrays
PostgreSQL supports true multi-dimensional array types, allowing any base type (integer, text, composite types, enums, domains, etc.) to be represented as an array.
Arrays can be:
1-dimensional (INT[])
multi-dimensional (INT[][], TEXT[][][])
mixed with composite values (mytype[])
Declaring Array Columns
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
tags TEXT[],
ratings INT[],
matrix INT[][] -- multi-dimensional
);
Array syntax adds [] to any data type.
Arrays can store NULL values per element, and the array itself may also be NULL.
Array Literals
Values are written using curly braces {}.
INSERT INTO products(name, tags, ratings)
VALUES ('Keyboard', '{mechanical,usb}', '{5,4,5}');
ENUM = and comparison operations are faster than TEXT because values are internally integers.
ENUMs index extremely well with B-tree indexes:
CREATE INDEX idx_orders_status ON orders(status);
Migrating ENUM Values Safely
ALTER TYPE order_status ADD VALUE 'in_transit';
You cannot remove ENUM values.
You cannot change their order after creation.
Best practice for major schema migration:
-- 1. Rename old enum
ALTER TYPE order_status RENAME TO order_status_old;
-- 2. Create new enum with correct values
CREATE TYPE order_status AS ENUM ('pending','processing','...');
-- 3. Convert columns
ALTER TABLE orders
ALTER COLUMN status TYPE order_status
USING status::TEXT::order_status;
-- 4. Drop old enum
DROP TYPE order_status_old;
Real-World Examples
-- User roles
CREATE TYPE user_role AS ENUM ('user','moderator','admin');
CREATE TABLE users (
id SERIAL,
name TEXT,
role user_role NOT NULL DEFAULT 'user'
);
-- Ticket priorities with ordering
CREATE TYPE priority AS ENUM ('low','medium','high','urgent');
SELECT * FROM tickets ORDER BY priority; -- correct severity order
-- Payment status usage
CREATE TYPE payment_status AS ENUM ('created','paid','refunded');
CREATE TABLE invoices (
id SERIAL,
amount NUMERIC,
status payment_status
);
The xml Type in PostgreSQL
PostgreSQL provides a native xml data type that stores well-formed XML documents or fragments.
It supports:
XML parsing and validation
XPath queries
XML construction functions
Schema validation (XSD)
Conversions between XML, TEXT, and JSON
The xml type stores the document in text form but performs structure validation according to XML standards.
Declaring XML Columns
CREATE TABLE books (
id SERIAL PRIMARY KEY,
title TEXT,
metadata XML
);
Insert XML using string literals:
INSERT INTO books(title, metadata)
VALUES (
'Dune',
'<book><author>Herbert</author><year>1965</year></book>'
);
Invalid XML raises an error:
INSERT INTO books(metadata)
VALUES ('<bad xml>'); -- ERROR: not well-formed XML
XML Parsing and Validation
The xml type validates structure during inserts and casts.
To enforce more rules (DTD, XSD), use validation functions.
SELECT *
FROM XMLTABLE(
'/book'
PASSING '<book><title>Dune</title><year>1965</year></book>'::xml
COLUMNS
title TEXT PATH 'title',
year INT PATH 'year'
);
SELECT to_json(ARRAY[1,2,3]);
SELECT jsonb_pretty(profile) FROM users;
Aggregating to JSON
SELECT json_agg(profile) FROM users;
SELECT jsonb_build_object('count', COUNT(*), 'users', jsonb_agg(id))
FROM users;
Building JSON Documents
SELECT jsonb_build_object(
'name', name,
'skills', skills,
'age', age
)
FROM employees;
JSONB Path Queries (@ SQL/JSON)
PostgreSQL supports SQL/JSON path expressions:
SELECT jsonb_path_query(profile, '$.skills[*] ? (@ == "sql")')
FROM users;
SELECT * FROM users
WHERE jsonb_path_exists(profile, '$.age ? (@ > 20)');
Performance Notes
jsonb is optimized for:
searching
indexing
containment
updates
json is faster for:
pure text I/O
storing raw API responses
GIN indexes may become large; consider jsonb_path_ops for containment-only indexes.
CREATE INDEX idx_users_profile_pathops
ON users USING gin(profile jsonb_path_ops);
Real-World Examples
-- Find users in a specific city
SELECT * FROM users
WHERE profile#>>'{address,city}' = 'Berlin';
-- Check if user has a skill
SELECT * FROM users
WHERE profile->'skills' ? 'postgresql';
-- Merge JSON objects
SELECT profile || '{"active":true}'::jsonb
FROM users;
-- Build a complete API response
SELECT jsonb_build_object(
'status', 'ok',
'timestamp', now(),
'data', jsonb_agg(profile)
)
FROM users;
GIN Indexes in PostgreSQL
GIN stands for Generalized Inverted Index.
GIN indexes are designed for indexing values that contain multiple components, such as:
arrays
JSONB objects
documents (full-text search)
key-value stores (hstore)
trigrams (pg_trgm)
range types
GIN is optimized for containment queries:
@> (contains)
&& (overlaps)
?, ?|, ?& (JSONB and hstore key checks)
full-text operators (@@, @@@)
GIN indexes are usually much larger than B-tree, but provide excellent query performance on complex data.
How GIN Works Internally
GIN indexes store a mapping from value component → row locations.
For example, if a row has tags = '{a,b,c}', GIN indexes it as:
a → row_id
b → row_id
c → row_id
This makes GIN extremely efficient for membership / containment operations.
GIN stores:
a posting tree for values appearing many times
a posting list for values appearing few times
PostgreSQL chooses dynamically whether to store a list or a tree.
Creating a GIN Index
CREATE INDEX idx_users_tags_gin
ON users USING gin(tags);
Default operator class depends on the column type.
GIN Indexes on Arrays
CREATE TABLE products (
id SERIAL,
tags TEXT[]
);
CREATE INDEX idx_products_tags_gin
ON products USING gin(tags);
Now these queries are extremely fast:
SELECT * FROM products WHERE tags @> '{sql}';
SELECT * FROM products WHERE tags && '{postgres,sql}';
How it works:
GIN indexes each element individually.
GIN Indexes on JSONB
CREATE TABLE events (
id SERIAL,
data JSONB
);
CREATE INDEX idx_events_data_gin
ON events USING gin(data);
SELECT * FROM events WHERE data @> '{"type":"login"}';
SELECT * FROM events WHERE data ? 'user_id';
SELECT * FROM events WHERE data @> '{"tags":["security"]}';
Supported operators include:
@> contains
? key
?| any_key
?& all_keys
jsonb_path_ops vs default jsonb_ops
jsonb_ops (default):
Indexes full JSON structure
Supports all JSONB operators
Larger index size
jsonb_path_ops:
Indexes only paths → great for @>
Smaller index
Does NOT support ?, ?|, ?&
CREATE INDEX idx_events_data_pathops
ON events USING gin(data jsonb_path_ops);
GIN for Full-text Search (TSVECTOR)
CREATE TABLE docs (
id SERIAL,
body TEXT
);
CREATE INDEX idx_docs_tsv
ON docs USING gin(to_tsvector('english', body));
SELECT * FROM docs
WHERE to_tsvector('english', body) @@ to_tsquery('security & data');
GIN indexes are usually faster than GiST for full-text search, but larger.
GIN for hstore
CREATE EXTENSION hstore;
CREATE TABLE settings (
id SERIAL,
attrs hstore
);
CREATE INDEX idx_settings_attrs
ON settings USING gin(attrs);
SELECT * FROM settings
WHERE attrs ? 'theme';
GIN with pg_trgm (trigram search)
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm
ON users USING gin (name gin_trgm_ops);
Efficient for:
LIKE '%term%'
ILIKE '%term%'
Regex matching
Similarity search (%<)
GIN Index Options
CREATE INDEX idx_logs_data_gin
ON logs USING gin(data)
WITH (fastupdate = on);
fastupdate:
ON (default): inserts go to a pending list, flushed lazily
OFF: slower inserts, faster reads
gin_pending_list_limit:
Controls the size of the pending list
Flushed automatically when threshold reached
Maintenance & Reindexing
VACUUM (ANALYZE) table_name;
REINDEX INDEX idx_name;
GIN indexes tend to grow over time because new keys get added but old keys are not deleted immediately.
Auto-vacuum manages GIN cleanup, but manual tuning may be needed for large workloads.
Performance Notes
GIN index size is large (sometimes 2–5× table size).
Inserts and updates are slower than B-tree because:
each indexed element is stored separately
Reads are extremely fast for supported operators.
Best for:
JSONB documents
arrays
hstore
text search
similarity search
Real-World Examples
-- Search logs by JSONB keys
SELECT * FROM logs
WHERE data @> '{"status":"error"}';
-- Find users with certain tags
SELECT * FROM users
WHERE tags && '{music,sql}';
-- Fast full-text search with trigram indexing
SELECT * FROM documents
WHERE body ILIKE '%database%';
-- Multi-condition JSONB search
SELECT * FROM events
WHERE data @> '{"user":{"id":5}}';
The search_path in PostgreSQL
Introduction
The search_path is a PostgreSQL configuration that determines how unqualified names (tables, functions, types) are resolved.
Whenever you reference an object without a schema name, PostgreSQL searches schemas in the order defined by search_path.
The default value is usually:
SHOW search_path;
-- "$user", public
This means:
First: a schema named after the current user (rarely exists)
Then: the public schema
Why Search Path Matters
Prevents needing fully qualified names like myschema.mytable everywhere.
Affects how:
SELECT, INSERT, UPDATE, DELETE resolve tables
Functions are resolved
Operators are resolved
Types are resolved
Important for:
Multi-schema applications
Security (SQL injection risks)
Schema-based versioning
Extensions
Setting the Search Path
SET search_path TO myschema, public;
SET LOCAL search_path TO analytics;
-- Only for duration of current transaction
ALTER DATABASE mydb SET search_path = myschema, public;
ALTER USER myuser SET search_path = app, public;
Precedence:
SET (session-level) overrides user and DB settings
ALTER USER overrides ALTER DATABASE
How Object Resolution Works
Given:
SET search_path TO sales, public;
SELECT customers.id FROM customers;
PostgreSQL looks for customers in:
1. sales.customers
2. public.customers
If both exist, the earlier schema wins.
SELECT sales.customers.id -- always resolved exactly
FROM sales.customers;
Fully qualified names always bypass search_path.
Search Path and Function Resolution
Function lookup also uses search_path.
This can lead to ambiguity:
SET search_path TO analytics, public;
SELECT format_date('2024-01-01');
PostgreSQL resolves format_date first in analytics, then public.
To force a specific function:
SELECT utils.format_date('2024-01-01');
Search Path and Security
This is one of the most important reasons to understand search_path.
If search_path includes a schema writable by untrusted users, they can override functions!
Danger example: SQL injection possibility.
SET search_path TO public, malicious_schema;
-- A malicious user creates a function matching your app's name
CREATE FUNCTION malicious_schema.now() RETURNS timestamp AS
$$
SELECT '1970-01-01'::timestamp;
$$ LANGUAGE sql;
Now your application calling SELECT now() executes attacker code!
Best practice:
Always put pg_catalog first.
Never put writable schemas before trusted ones.
For functions, explicitly schema-qualify when safety matters.
ALTER ROLE myapp SET search_path = pg_catalog, app, public;
Temporary Tables and Search Path
Temp tables live in a special schema: pg_temp.
PostgreSQL automatically puts pg_temp at the beginning of search_path if any temp tables exist.
CREATE TEMP TABLE t(x int);
SHOW search_path;
-- pg_temp, pg_catalog, public
Unqualified names now resolve to the temp table first!
search_path for Extensions
CREATE EXTENSION postgis SCHEMA gis;
Functions from extensions are usually called without qualification.
Make sure the schema containing extension functions is on search_path.
Best Practices
Always includepg_catalogfirst.
For applications, use: pg_catalog, app_schema
Do not include schemas writable by untrusted users.
Always fully qualify objects inside migrations:
CREATE TABLE app.users (...);
CREATE FUNCTION app.calculate() ...;
This prevents migrations from breaking if search_path changes.
Real-World Examples
-- Application search path
ALTER ROLE myapp SET search_path = pg_catalog, app, public;
-- Safely executing migrations
SET search_path = app;
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;
-- Using multiple schemas for versioning
SET search_path = v2, v1, public;
SELECT get_customer(5);
Casting & Type Conversion in PostgreSQL
Introduction
Casting (type conversion) converts a value from one data type to another.
PostgreSQL supports:
explicit casts (written by the user)
implicit casts (performed automatically)
function-based casts (custom cast definitions)
domain casts
PostgreSQL allows casting between:
numeric types
character types
arrays
JSON, XML
composite & domain types
user-defined types
Explicit Casts
SELECT '123'::INT;
SELECT CAST('123' AS INT);
SELECT CAST(5.8 AS NUMERIC(3,1));
Explicit casts always work the same way: the source is interpreted and validated for the target type.
If conversion is impossible, PostgreSQL raises an error:
SELECT 'abc'::INT;
-- ERROR: invalid input syntax for type integer: "abc"
CREATE FUNCTION text_to_int_safe(text)
RETURNS int
LANGUAGE plpgsql AS
$$
BEGIN
RETURN CASE WHEN $1 ~ '^\d+$' THEN $1::int ELSE NULL END;
END;
$$;
CREATE CAST (text AS int)
WITH FUNCTION text_to_int_safe(text)
AS ASSIGNMENT;
AS ASSIGNMENT allows implicit casting when appropriate.
AS IMPLICIT creates a very permissive implicit cast (rarely recommended).
Create Your Own Type Casts
CREATE CAST (point2d AS text)
WITH FUNCTION point2d_to_text(point2d);
Explicit cast always wins over PostgreSQL's automatic cast.
Real-World Examples
-- Read JSON number as integer
SELECT (data->>'count')::INT FROM metrics;
-- Convert text array to integer array
SELECT string_to_array('1,2,3', ',')::INT[];
-- Cast timestamp to date
SELECT created_at::DATE FROM orders;
-- Cast enum to text
SELECT status::TEXT FROM users;
-- Format a composite type
SELECT (ROW(5,6)::point2d).x;
Triggers in PostgreSQL
Introduction
A trigger is a special database object that automatically executes a function when a table event occurs.
Trigger events include:
INSERT
UPDATE
DELETE
TRUNCATE
Triggers can run:
BEFORE the statement
AFTER the statement
INSTEAD OF the statement (for views)
Triggers call a trigger function (a function with RETURNS trigger), written in SQL or PL/pgSQL.
Types of Triggers
Type
Description
Row-Level Trigger
Executed once per affected row. Has access to OLD and NEW records.
Statement-Level Trigger
Executed once per SQL statement, regardless of how many rows are modified.
BEFORE Trigger
Can validate or modify NEW data before writing to the table.
AFTER Trigger
Runs after the change is committed to the table; cannot modify NEW.
INSTEAD OF Trigger
Replaces default behavior on views.
Trigger Function Basics
A trigger function must return RETURNS trigger.
For BEFORE row-level triggers:
return NEW to continue
For BEFORE DELETE:
return OLD
For AFTER triggers:
the return value is ignored
CREATE OR REPLACE FUNCTION set_created_at()
RETURNS trigger AS
$$
BEGIN
NEW.created_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Creating a Trigger
CREATE TRIGGER set_timestamp
BEFORE INSERT ON users
FOR EACH ROW
EXECUTE FUNCTION set_created_at();
When inserting into users, created_at is automatically set.
Accessing OLD and NEW
NEW.column_name -- the new row (for INSERT/UPDATE)
OLD.column_name -- the old row (for UPDATE/DELETE)
INSERT: only NEW exists
UPDATE: both NEW and OLD
DELETE: only OLD
IF NEW.amount <= 0 THEN
RAISE EXCEPTION 'Amount must be positive';
END IF;
BEFORE vs AFTER
Type
Usage
BEFORE
Validate data
Modify NEW values
Reject rows with RAISE EXCEPTION
AFTER
Audit logging
Notification
Complex cross-table updates
Row-Level vs Statement-Level
FOR EACH ROW -- fires once per row
FOR EACH STATEMENT -- fires once per statement
Use statement triggers for:
audit counters
accumulation
constraint-like logic
INSTEAD OF Triggers (for Views)
CREATE VIEW user_view AS
SELECT id, name FROM users;
Normally views are not writable unless PostgreSQL can auto-expand them.
INSTEAD OF triggers allow custom insert/update/delete behavior.
CREATE OR REPLACE FUNCTION user_view_insert()
RETURNS trigger AS
$$
BEGIN
INSERT INTO users(name) VALUES(NEW.name);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER user_view_ins
INSTEAD OF INSERT ON user_view
FOR EACH ROW
EXECUTE FUNCTION user_view_insert();
Constraint Triggers vs Normal Triggers
Type
Characteristics
Normal Trigger
Fires immediately during statement execution
Constraint Trigger
Uses DEFERRABLE
Can fire at end of transaction
Useful for foreign-key-like logic
CREATE CONSTRAINT TRIGGER trg_name
AFTER INSERT ON orders
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE FUNCTION check_order();
Transition Tables (Statement-Level AFTER Triggers)
PostgreSQL allows capturing the set of changed rows:
REFERENCING NEW TABLE AS newrows
REFERENCING OLD TABLE AS oldrows
CREATE TRIGGER bulk_update_audit
AFTER UPDATE ON accounts
REFERENCING OLD TABLE AS oldset NEW TABLE AS newset
FOR EACH STATEMENT
EXECUTE FUNCTION audit_bulk_update();
The function can run:
SELECT * FROM newset;
SELECT * FROM oldset;
Enabling and Disabling Triggers
ALTER TABLE users DISABLE TRIGGER set_timestamp;
ALTER TABLE users ENABLE TRIGGER set_timestamp;
ALTER TABLE users DISABLE TRIGGER ALL;
ALTER TABLE users ENABLE TRIGGER ALL;
Superusers may disable system triggers using ALTER TABLE ... DISABLE TRIGGER ALL.
Trigger Execution Order
If multiple triggers exist for the same event, PostgreSQL fires them in alphabetical order by trigger name.
CREATE TRIGGER a_before BEFORE INSERT ON t ...
CREATE TRIGGER z_before BEFORE INSERT ON t ...
-- a_before fires before z_before
Recursive Triggers & Avoiding Infinite Loops
CREATE OR REPLACE FUNCTION update_total()
RETURNS trigger AS
$$
BEGIN
UPDATE accounts SET total = total + NEW.amount
WHERE id = NEW.account_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
This can cause infinite recursion if accounts also has an UPDATE trigger.
-- Use WHEN clause to guard
CREATE TRIGGER safe_trigger
BEFORE UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.total IS DISTINCT FROM NEW.total)
EXECUTE FUNCTION update_total();
Performance Considerations
Triggers can slow down:
large bulk inserts
ETL pipelines
high-throughput OLTP workloads
Prefer:
statement-level triggers when possible
batch processing instead of row-by-row logic
avoiding heavy work inside triggers (use queues, NOTIFY/LISTEN)
AFTER triggers have consistent view of committed data.
BEFORE triggers must be fast since they block write operations.
Real-World Examples
1. Auto-updating updated_at
CREATE OR REPLACE FUNCTION touch_row()
RETURNS trigger AS
$$
BEGIN
NEW.updated_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_touch
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION touch_row();
2. Audit log table
CREATE OR REPLACE FUNCTION audit_insert()
RETURNS trigger AS
$$
BEGIN
INSERT INTO audit(table_name, row_data, action, ts)
VALUES(TG_TABLE_NAME, row_to_json(NEW), 'insert', now());
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_users
AFTER INSERT ON users
FOR EACH ROW
EXECUTE FUNCTION audit_insert();
3. Automatically maintain a denormalized counter
CREATE OR REPLACE FUNCTION inc_post_count()
RETURNS trigger AS
$$
BEGIN
UPDATE authors SET post_count = post_count + 1
WHERE id = NEW.author_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER posts_count_inc
AFTER INSERT ON posts
FOR EACH ROW
EXECUTE FUNCTION inc_post_count();
4. Prevent updates on archived rows
CREATE OR REPLACE FUNCTION prevent_update_archived()
RETURNS trigger AS
$$
BEGIN
IF OLD.archived THEN
RAISE EXCEPTION 'Cannot update archived row';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER no_update_archived
BEFORE UPDATE ON documents
FOR EACH ROW
EXECUTE FUNCTION prevent_update_archived();