About PostgreSQL

Creating Tables in PostgreSQL

Introduction

In PostgreSQL, a table is a structured collection of rows and columns where data is stored.

You define a table using the CREATE TABLE command, specifying the column names, data types, and optional constraints (like PRIMARY KEY, NOT NULL, or UNIQUE).

Tables belong to a schema within a database. By default, they are created inside the public schema unless another one is specified.

Basic Syntax

CREATE TABLE table_name (
    column_name1 data_type [constraint],
    column_name2 data_type [constraint],
    ...
    [table_constraints]
);

table_name: the name of your new table.
column_name: the name of each column in that table.
data_type: defines the kind of data stored (e.g., INTEGER, TEXT, BOOLEAN, DATE, etc.).
[constraint]: optional restrictions like NOT NULL, DEFAULT, etc.
[table_constraints]: rules applying to multiple columns, e.g. primary keys, foreign keys.

Example: Creating a Simple Table

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name  VARCHAR(50) NOT NULL,
    last_name   VARCHAR(50) NOT NULL,
    hire_date   DATE NOT NULL DEFAULT CURRENT_DATE,
    active      BOOLEAN DEFAULT TRUE
);

SERIAL automatically creates an auto-incrementing integer column (in modern PostgreSQL, use GENERATED ALWAYS AS IDENTITY instead).
PRIMARY KEY ensures each row has a unique, non-null identifier.
VARCHAR(50) defines a text field up to 50 characters.
DEFAULT CURRENT_DATE auto-fills with today's date when no value is given.

Modern Auto-Increment Syntax (Preferred)

CREATE TABLE employees (
    employee_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    first_name  TEXT NOT NULL,
    last_name   TEXT NOT NULL,
    department  TEXT DEFAULT 'General'
);

GENERATED ALWAYS AS IDENTITY is the SQL-standard version of SERIAL.
You can also use GENERATED BY DEFAULT AS IDENTITY if you want to allow manual insertion of custom IDs.

Adding Table-Level Constraints

CREATE TABLE orders (
    order_id      INT GENERATED ALWAYS AS IDENTITY,
    customer_id   INT NOT NULL,
    order_total   NUMERIC(10,2) NOT NULL CHECK (order_total >= 0),
    order_date    TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (order_id),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CHECK (order_total >= 0) ensures totals can't be negative.
PRIMARY KEY and FOREIGN KEY constraints can appear at the bottom as table-level definitions.
NUMERIC(10,2) means up to 10 digits total, 2 after the decimal point.

Common Data Types

Type	Example	Description
`INTEGER`, `BIGINT`, `SMALLINT`	42	Whole numbers, various ranges.
`NUMERIC(p, s)`	1234.56	Fixed-point precision numbers (money, totals).
`TEXT`, `VARCHAR(n)`, `CHAR(n)`	'Hello'	Variable or fixed-length strings.
`DATE`, `TIME`, `TIMESTAMP`	'2025-01-01'	Date and time storage.
`BOOLEAN`	TRUE / FALSE	Logical values.
`JSON`, `JSONB`	'{"key": "value"}'	Structured JSON data (JSONB is binary-optimized).

Creating Tables in a Specific Schema

CREATE SCHEMA company;

CREATE TABLE company.departments (
    id   INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT NOT NULL UNIQUE
);

By default, PostgreSQL creates tables under the public schema.
Use CREATE SCHEMA to organize tables logically (e.g., per project or domain).
You can qualify table names like schema_name.table_name.

Viewing Tables

-- List all tables in current database
\dt

-- List tables in a specific schema
\dt company.*

-- Show table structure
\d employees

These commands are used inside the psql interactive terminal.
\dt lists tables, \d <table> describes structure, columns, and constraints.

Altering or Dropping Tables

ALTER TABLE employees ADD COLUMN email TEXT;
ALTER TABLE employees DROP COLUMN active;
DROP TABLE employees;

ALTER TABLE allows adding/removing columns or modifying constraints.
DROP TABLE permanently deletes the table (and all its data).

Understanding Schemas in PostgreSQL

What is a Schema?

A schema in PostgreSQL is like a folder or namespace inside a database.

Each schema contains its own set of tables, views, functions, and other objects — just like different folders can contain different files.

This allows you to:
- Organize your database objects logically (e.g. hr.employees, sales.orders).
- Have tables with the same name in different schemas without conflict (e.g. test.users vs prod.users).
- Control access permissions at a schema level.

Database vs Schema

PostgreSQL databases are the top-level containers.

A schema lives inside a database, and tables live inside a schema.


Database → Schema → Table → Row

For example, you might have a database called company_db, which contains two schemas:


company_db
 ├── public
 │    ├── employees
 │    ├── departments
 │
 └── analytics
      ├── sales_summary
      ├── monthly_reports

This way, you can separate your operational data (public) from analytical data (analytics).

The Default Schema: public

When you first create a new database, PostgreSQL automatically creates a schema named public.

Unless you specify otherwise, every table, view, or function you create goes into the public schema.

-- This table is created inside the default "public" schema
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
);

Its full name is actually public.users, even if you normally omit the schema prefix.

Creating a Custom Schema

CREATE SCHEMA hr;
CREATE SCHEMA sales;

Now you have two additional namespaces: hr and sales.

You can create tables inside them using a schema-qualified name:

CREATE TABLE hr.employees (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    position TEXT
);

CREATE TABLE sales.orders (
    id SERIAL PRIMARY KEY,
    amount NUMERIC(10,2),
    customer TEXT
);

Now hr.employees and sales.orders are two independent tables, even though both live in the same database.

Accessing Tables in Schemas

-- Access table using schema-qualified name
SELECT * FROM hr.employees;

-- Access table from the default "public" schema
SELECT * FROM public.users;

You always access tables by schema_name.table_name.

If you omit the schema, PostgreSQL looks for the table in the schemas listed in the current search_path (see below).

The Search Path

The search_path is the list of schemas that PostgreSQL looks through when you reference an object without a schema prefix.

By default, it is usually:

SHOW search_path;

-- Output example:
-- "public", "$user"

This means:
- First, PostgreSQL looks for the object in public.
- Then, it looks for a schema with the same name as the current user.

You can change the search path to prioritize another schema:

SET search_path TO hr, public;

-- Now "employees" means "hr.employees"
SELECT * FROM employees;

When working on large projects, this helps you avoid writing hr. or sales. all the time.

Dropping a Schema

DROP SCHEMA hr;

-- Or remove all its objects automatically:
DROP SCHEMA hr CASCADE;

DROP SCHEMA hr; will fail if the schema is not empty.
CASCADE deletes the schema and all contained tables, views, and functions.
Use CASCADE with caution — this is irreversible!

Schema Permissions

You can grant or revoke access to schemas independently of table permissions.

-- Grant a user permission to create objects in a schema
GRANT CREATE ON SCHEMA sales TO alice;

-- Allow a user to use (read from) the schema
GRANT USAGE ON SCHEMA hr TO bob;

USAGE allows access to existing objects inside the schema.
CREATE allows creating new objects within that schema.
Combining these allows a user to both read and create inside that namespace.

When to Use Schemas

Organizing modules: for example, hr, finance, and sales schemas in a company database.

Separating environments: e.g., dev, test, prod schemas in the same database for testing purposes.

Multi-tenant applications: one schema per customer, while keeping all data in a single database.

Access control: different roles can have access to different schemas.

Summary

A schema is a namespace inside a PostgreSQL database that groups related tables and objects together.

By default, all objects go into the public schema, but you can create your own with CREATE SCHEMA.

Access tables as schema_name.table_name, or adjust search_path for convenience.

Use schemas to organize projects, control permissions, and keep your database clean and modular.

Inserting Data into Tables

Introduction

Once you've created a table, you can add rows of data using the INSERT INTO statement.

Each inserted row corresponds to one record in the table, containing values for one or more columns.

PostgreSQL allows flexible insertion:
- Insert all columns at once.
- Insert into selected columns only.
- Insert multiple rows in a single command.
- Insert data from another table or query.

Basic Syntax

INSERT INTO <table_name> (<column1>, <column2>, <column3>, ...)
VALUES (<value1>, <value2>, <value3>, ...);

Example: Simple Insert

INSERT INTO employees (first_name, last_name, department)
VALUES ('Alice', 'Johnson', 'Engineering');

The employee_id column (if auto-generated) does not need to be specified, PostgreSQL fills it automatically.
String literals are enclosed in single quotes ('...').
After execution, a new row appears in employees.

Inserting into All Columns

INSERT INTO employees
VALUES (DEFAULT, 'Bob', 'Smith', 'Sales');

If you omit the column list, you must provide a value for every column in order.
Use DEFAULT to let PostgreSQL fill an auto-generated value (like an identity or timestamp).
This form is more error-prone — prefer naming columns explicitly when possible.

Inserting Multiple Rows at Once

INSERT INTO employees (first_name, last_name, department)
VALUES
    ('Charlie', 'Wang', 'Finance'),
    ('Diana', 'Lopez', 'Marketing'),
    ('Ethan', 'Neuer', 'IT');

This is faster and cleaner than running multiple INSERT statements.

Using DEFAULT Values

INSERT INTO employees (first_name, last_name)
VALUES ('Frank', 'Baker');

INSERT INTO employees (first_name, last_name, department)
VALUES ('Grace', 'Kim', DEFAULT);

If a column has a DEFAULT value, you can omit it or explicitly use DEFAULT.
In both cases, PostgreSQL automatically fills in the default.

Inserting Data from Another Table

INSERT INTO archived_employees (first_name, last_name, department)
SELECT first_name, last_name, department
FROM employees
WHERE active = FALSE;

Instead of manually typing values, you can pull data from an existing table or query.
This is useful for:
- Archiving old records,
- Duplicating data,
- Creating backups or subsets.

Returning Values After Insert

INSERT INTO employees (first_name, last_name, department)
VALUES ('Helene', 'Zabak', 'Operations')
RETURNING employee_id;

The RETURNING clause immediately outputs data from the newly inserted row(s).

You can return multiple columns:

RETURNING employee_id, first_name, hire_date;

Handling Conflicts (Upserts)

INSERT INTO employees (employee_id, first_name, last_name, department)
VALUES (1, 'Ian', 'Miller', 'Engineering')
ON CONFLICT (employee_id) DO UPDATE SET department = EXCLUDED.department;

ON CONFLICT handles cases where an insert violates a unique or primary key constraint.
EXCLUDED refers to the new row that caused the conflict.
You can:
- DO NOTHING to skip the row, or
- DO UPDATE to modify the existing row.

Example: Full Insert Scenario

CREATE TABLE customers (
    id           SERIAL PRIMARY KEY,
    name         TEXT NOT NULL,
    email        TEXT UNIQUE,
    created_at   TIMESTAMP DEFAULT NOW()
);

INSERT INTO customers (name, email)
VALUES
    ('Alice Green', 'alice@example.com'),
    ('Bob White', 'bob@example.com');

INSERT INTO customers (name, email)
VALUES ('Alice Green', 'alice@example.com')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

The first insert adds two new customers.
The second one updates an existing customer (instead of erroring) because of the ON CONFLICT rule.
NOW() automatically sets the timestamp.

Basics on Selecting Data from Tables

Introduction

The SELECT statement in PostgreSQL is used to query data from one or more tables.

PostgreSQL processes SELECT queries logically in this order:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

This order matters when you design complex queries.

Basic Syntax

SELECT <column1>, <column2>, ...
FROM <table_name>;

You can select all columns using *, but it's better to specify only what you need.

Example: Select All Columns

SELECT * FROM employees;

Retrieves every column and every row from employees.
Useful for quick checks, but inefficient in production queries.

Filtering Rows with WHERE

SELECT first_name, last_name
FROM employees
WHERE department = 'Engineering';

WHERE filters rows based on a condition.
Operators include:
- = equal
- <> or != not equal
- >, <, >=, <=
- AND, OR, NOT for logical combinations

Filtering with Pattern Matching (LIKE)

SELECT first_name, last_name
FROM employees
WHERE last_name LIKE 'S%';

LIKE is used for pattern matching with wildcards:
- %: matches any sequence of characters.
- _: matches exactly one character.
Example: 'S%' finds all last names starting with "S".

Sorting Results with ORDER BY

SELECT first_name, last_name, hire_date
FROM employees
ORDER BY hire_date DESC;

ORDER BY sorts rows by one or more columns.
Use ASC (default) or DESC for descending order.
You can sort by multiple columns:
```
ORDER BY department ASC, last_name ASC;
```

Limiting Results with LIMIT and OFFSET

SELECT * FROM employees
ORDER BY employee_id
LIMIT 5 OFFSET 10;

LIMIT restricts how many rows are returned.
OFFSET skips the given number of rows before starting output.
Commonly used for pagination (e.g., "page 3, 10 results per page").

Aliasing Columns and Tables

SELECT
    first_name AS given,
    last_name AS family
FROM employees AS e;

Aliases give temporary names to columns or tables for readability.

You can omit AS, both forms work:

SELECT first_name given, last_name family FROM employees e;

Combining Conditions

SELECT first_name, last_name, department
FROM employees
WHERE department = 'Sales'
    AND hire_date >= '2024-01-01'
    OR active = TRUE;

Use parentheses to control logical order:

WHERE (department = 'Sales' AND hire_date >= '2024-01-01') OR active = TRUE;

Distinct Rows

SELECT DISTINCT department
FROM employees;

DISTINCT removes duplicate rows from the result set.

You can apply it to one or multiple columns:

SELECT DISTINCT department, active FROM employees;

Computed Columns and Expressions

SELECT
    first_name || ' ' || last_name AS full_name,
    EXTRACT(YEAR FROM hire_date)   AS hire_year
FROM employees;

You can create new columns by combining or transforming existing ones.
|| concatenates strings.
EXTRACT() extracts a specific part (like year, month, day) from a date or timestamp.

Aggregating Data with GROUP BY

SELECT
    department,
    COUNT(*) AS total_employees
FROM employees
GROUP BY department;

GROUP BY groups rows by unique combinations of column values.
Often used with aggregate functions:
- COUNT(): number of rows.
- SUM(): total sum.
- AVG(): average.
- MIN() / MAX(): smallest / largest value.

Filtering Groups with HAVING

SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

HAVING filters groups (after aggregation), while WHERE filters individual rows (before aggregation).
Example: only show departments with more than 5 employees.

Combining Results with UNION

SELECT first_name, last_name FROM employees
UNION
SELECT first_name, last_name FROM contractors;

UNION merges two result sets and removes duplicates.
Use UNION ALL to keep duplicates.
Each SELECT must have the same number and type of columns.

Concatenating Strings in PostgreSQL

Introduction

String concatenation means joining two or more strings together into a single text value.

In PostgreSQL, you can concatenate strings using:
- The concatenation operator || (most common and SQL-standard).
- The CONCAT() function.
- The CONCAT_WS() function (concatenate with a separator).

Using the || Operator

SELECT 'Hello' || ' ' || 'World' AS greeting;

The || operator joins strings directly.
In this example, PostgreSQL returns Hello World.
You can concatenate columns and literals together as well:

SELECT
    first_name || ' ' || last_name AS full_name
FROM employees;

This is the most widely used approach for text combination in PostgreSQL.

Handling NULLs with ||

SELECT 'Hello' || NULL || 'World' AS result;

If any operand is NULL, the entire concatenated result becomes NULL.
Example result: NULL.
To avoid this, use the COALESCE() function to replace NULL with an empty string:

SELECT 'Hello' || COALESCE(NULL, '') || 'World' AS safe_concat;

Result: HelloWorld.

Using CONCAT()

SELECT CONCAT('Hello', ' ', 'World') AS greeting;

CONCAT() joins any number of arguments together.
Unlike ||, it automatically treats NULL as an empty string.
Example:

SELECT CONCAT('Hello', NULL, 'World') AS greeting;

Result: HelloWorld (not NULL).
This makes CONCAT() safer when dealing with nullable columns.

Using CONCAT_WS() — Concatenate with Separator

SELECT CONCAT_WS('-', '2025', '11', '10') AS formatted_date;

CONCAT_WS(separator, arg1, arg2, ...) stands for "Concatenate With Separator."
The first argument is used as the separator between all other arguments.
NULL values are ignored automatically.

Example with NULL:

SELECT CONCAT_WS(' ', 'Alice', NULL, 'Johnson');

Result: Alice Johnson.

Concatenating Columns Example

SELECT
    first_name || ' ' || last_name AS full_name,
    department || ' Department'    AS dept_name
FROM employees;

Concatenation in UPDATE Statements

UPDATE employees
SET full_name = first_name || ' ' || last_name;

You can use concatenation when updating data, for example, to fill a derived column.
PostgreSQL evaluates the expressions on the right-hand side for each row.

Concatenation with Numbers and Other Types

SELECT
    'Employee ID: ' || employee_id AS label
FROM employees;

PostgreSQL automatically casts numbers, booleans, and dates to text when concatenated with a string.
You can also use explicit conversion for clarity:

SELECT 'Employee ID: ' || CAST(employee_id AS TEXT) FROM employees;

Formatting Example: Full Address

SELECT
    CONCAT_WS(', ',
        street,
        city,
        postal_code,
        country
    ) AS full_address
FROM customers;

This builds a clean formatted address string, skipping NULL parts automatically.
Example result: "Main Street 5, Berlin, 10115, Germany".

Dropping Tables in PostgreSQL

Introduction

The DROP TABLE statement permanently removes a table and all its data from a PostgreSQL database.

When you drop a table, PostgreSQL deletes:
- All rows stored in the table.
- Associated indexes and constraints.
- Any dependent objects (if you use CASCADE).

Once dropped, the data cannot be recovered unless you have a backup.

Basic Syntax

DROP TABLE <table_name>;

table_name — the name of the table you want to delete.
If the table doesn't exist, PostgreSQL will raise an error unless you use DROP TABLE IF EXISTS.

Example: Dropping a Table

DROP TABLE employees;

Deletes the employees table completely from the database.
All its data, indexes, and constraints are removed permanently.

Safely Dropping a Table

DROP TABLE IF EXISTS employees;

Prevents an error if the table does not exist.
This is a safer way to clean up tables, especially in scripts or migrations.

Dropping Multiple Tables

DROP TABLE IF EXISTS employees, departments, projects;

You can drop multiple tables in one command by separating their names with commas.
Each table will be dropped in the order listed.

Using CASCADE and RESTRICT

DROP TABLE employees CASCADE;

CASCADE automatically drops any dependent objects (like foreign keys in other tables, views, or triggers).
Use carefully — this can remove more than you expect!
To prevent accidental cascade deletion, use RESTRICT (the default behavior):

DROP TABLE employees RESTRICT;

PostgreSQL will refuse to drop the table if anything depends on it.

Example: Dropping with Dependencies

-- Example: "customers" table references "orders" through a foreign key
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY
);

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id)
);

-- Dropping the parent table with CASCADE
DROP TABLE orders CASCADE;

CASCADE tells PostgreSQL to remove not only the orders table itself, but also any objects that depend on it — such as foreign key constraints, views, or triggers.

In this example:
- The table orders is permanently deleted.
- The foreign key constraint (inside customers) is automatically removed, so customers.order_id can take whatever values it wants in the future.
- The customers table itself remains intact — only the relationship to orders is dropped.

PostgreSQL will also print a notice similar to:

NOTICE:  drop cascades to constraint customers_order_id_fkey on table customers
DROP TABLE

Always double-check before using CASCADE, as it can remove multiple dependent objects automatically.

Dropping a Table in a Specific Schema

DROP TABLE company.employees;

If the table belongs to a non-default schema, include the schema name.
Example: company.employees drops the employees table inside the company schema.

Checking if a Table Exists

-- Using psql meta-command
\dt

-- Or query the catalog directly
SELECT tablename
FROM pg_catalog.pg_tables
WHERE tablename = 'employees';

\dt lists all tables in the current database (works in psql).
Alternatively, query the pg_catalog.pg_tables system table to check existence programmatically.

Practical Tip: Use Transactions

BEGIN;
DROP TABLE employees;
ROLLBACK;

During testing, wrap DROP TABLE inside a transaction.
ROLLBACK cancels the deletion if you change your mind before committing.
This helps prevent irreversible data loss while experimenting.

Filtering Rows with the WHERE Clause

Introduction

The WHERE clause in PostgreSQL filters rows returned by a SELECT, UPDATE, or DELETE statement based on one or more conditions.

It is evaluated before grouping or ordering results — only rows satisfying the condition(s) are kept.

You can use logical operators (AND, OR, NOT), comparison operators (=, <, >, etc.), and range or set operators like BETWEEN, IN, and LIKE.

Basic Syntax

SELECT column1, column2, ...
FROM table_name
WHERE condition;

condition can include comparisons, logical connectors, or functions that evaluate to TRUE or FALSE.
Rows for which the condition is TRUE are returned; FALSE or NULL rows are excluded.

Comparison Operators

SELECT * FROM employees
WHERE department = 'Engineering';

Common operators:

Operator	Description	Example
`=`	Equal to	`salary = 50000`
`<>` or `!=`	Not equal to	`department <> 'IT'`
`>`	Greater than	`age > 30`
`<`	Less than	`salary < 40000`
`>=`	Greater than or equal to	`age >= 18`
`<=`	Less than or equal to	`hire_date <= '2025-01-01'`

Combining Conditions with AND, OR, NOT

SELECT * FROM employees
WHERE department = 'Sales'
    AND hire_date >= '2024-01-01'
    OR active = TRUE;

It is recommended to use parentheses for clarity:

WHERE (department = 'Sales' AND hire_date >= '2024-01-01') OR active = TRUE;

Filtering Ranges with BETWEEN

SELECT * FROM employees
WHERE salary BETWEEN 40000 AND 60000;

BETWEEN a AND b is equivalent to (salary >= a AND salary <= b).
It includes both boundary values.
You can use it with dates, numbers, or text:

WHERE hire_date BETWEEN '2024-01-01' AND '2024-12-31';

Filtering by a Set with IN

SELECT * FROM employees
WHERE department IN ('HR', 'IT', 'Finance');

IN checks whether a value matches any value in the provided list.
Equivalent to multiple OR conditions:

WHERE department = 'HR' OR department = 'IT' OR department = 'Finance';

You can also use a subquery:

WHERE department_id IN (SELECT id FROM departments WHERE active = TRUE);

Pattern Matching with LIKE and ILIKE

SELECT * FROM employees
WHERE last_name LIKE 'S%';

LIKE is case-sensitive pattern matching.
ILIKE is the case-insensitive version (PostgreSQL extension).
Examples:

WHERE name LIKE  'A%';      -- starts with A
WHERE name LIKE  '%son';    -- ends with son
WHERE name ILIKE '%SMITH%'; -- contains 'smith' (case-insensitive)

Filtering NULL Values

SELECT * FROM employees
WHERE department IS NULL;

SELECT * FROM employees
WHERE department IS NOT NULL;

= NULL does not work — use IS NULL and IS NOT NULL.
This is because NULL represents "unknown", not an actual value.

Checking Membership with EXISTS

SELECT *
FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d
    WHERE d.id = e.department_id
);

EXISTS checks whether a subquery returns at least one row.
If it does, the condition is TRUE, otherwise FALSE.
Often used for correlated subqueries and conditional joins.

Using ANY and ALL

SELECT * FROM employees
WHERE salary > ANY (SELECT salary FROM interns);

SELECT * FROM employees
WHERE salary > ALL (SELECT salary FROM interns);

ANY — condition is true if it holds for at least one value returned by the subquery.
ALL — condition must hold for every value returned by the subquery.
These are useful when comparing against ranges of values dynamically.

Combining Multiple Conditions

SELECT * FROM employees
WHERE department = 'Finance'
    AND (salary BETWEEN 40000 AND 60000)
    AND active = TRUE
    AND last_name ILIKE 'J%';

Conditions can be chained together in flexible and readable ways.
Parentheses make the logic explicit and prevent ambiguity.

Database Design in PostgreSQL

Introduction

Database design is the process of organizing data into structured tables that represent real-world entities and their relationships.

Good design ensures:
- Data integrity (accuracy and consistency).
- Scalability (efficient growth with more data).
- Performance (fast queries and updates).
- Maintainability (easy schema evolution and clear relationships).

PostgreSQL, as a relational database, uses tables, keys, and relationships to represent structured data logically and efficiently.

Step 1: Identify Entities and Attributes

Begin by identifying entities — objects or concepts you need to store data about (e.g. customers, orders, products).

For each entity, determine its attributes — descriptive properties or fields.


Entity: Customer
 ├── customer_id (primary key)
 ├── name
 ├── email
 └── phone

Entity: Order
 ├── order_id (primary key)
 ├── customer_id (foreign key)
 ├── order_date
 └── total_amount

Each entity becomes a table, and each attribute becomes a column.

Step 2: Define Primary Keys

A primary key uniquely identifies each row in a table.
PostgreSQL commonly uses:
- SERIAL or IDENTITY columns for numeric IDs.
- Or UUID for globally unique identifiers.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT UNIQUE
);

Each table should have exactly ONE primary key column or a composite key (a combination of multiple columns) when necessary.

Step 3: Define Relationships with Foreign Keys

Foreign keys define relationships between tables by referencing primary keys of other tables.
They maintain referential integrity, ensuring that related data exists before referencing it.

CREATE TABLE orders (
    order_id     SERIAL PRIMARY KEY,
    customer_id  INT REFERENCES customers(customer_id),
    order_date   DATE NOT NULL DEFAULT CURRENT_DATE,
    total_amount NUMERIC(10,2)
);

Here, orders.customer_id links each order to its corresponding customer.
You can specify actions like ON DELETE CASCADE or ON UPDATE SET NULL for dependency behavior.

Step 4: Normalize Your Data

Normalization is the process of structuring data to reduce redundancy and improve consistency.

Common normal forms:
- 1NF (First Normal Form) — No repeating groups or arrays, every field holds atomic values.
- 2NF (Second Normal Form) — Every non-key attribute depends entirely on the primary key.
- 3NF (Third Normal Form) — No transitive dependencies (non-key columns shouldn't depend on other non-key columns).

Example — Unnormalized table:


orders
 ├── order_id
 ├── customer_name
 ├── customer_email
 ├── product_1
 ├── product_2

Normalized design:


customers
 ├── customer_id
 ├── name
 ├── email

orders
 ├── order_id
 ├── customer_id
 ├── order_date

order_items
 ├── order_id
 ├── product_id
 ├── quantity

This separation avoids data duplication and keeps relationships clear.

Step 5: Add Constraints and Data Integrity Rules

Constraints enforce rules at the database level:
- NOT NULL — prevents missing values.
- UNIQUE — ensures no duplicates.
- CHECK — validates data according to conditions.
- DEFAULT — sets fallback values automatically.

CREATE TABLE employees (
    id        SERIAL PRIMARY KEY,
    name      TEXT NOT NULL,
    salary    NUMERIC(8,2) CHECK (salary >= 0),
    active    BOOLEAN DEFAULT TRUE
);

These ensure that invalid or inconsistent data cannot be inserted even by mistake.

Step 6: Use Indexes for Performance

Indexes speed up data retrieval by allowing faster lookups on large tables.
PostgreSQL automatically creates an index for primary and unique keys, but you can add more as needed.

CREATE INDEX idx_employees_department ON employees(department);

Use indexes wisely — they make reads faster but can slow down inserts and updates.

Step 7: Consider Denormalization (for Performance)

In some high-performance or reporting cases, you may intentionally duplicate certain data to avoid complex joins.
This is called denormalization — used to trade storage for speed.
Example: storing customer_name directly inside orders for faster querying, even though it's also in customers.

Step 8: Plan for Relationships — One-to-Many and Many-to-Many

One-to-Many — a customer can have many orders.

-- one-to-many
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id)
);

Many-to-Many — students can take many courses, and each course has many students.

-- many-to-many through a junction table
CREATE TABLE enrollments (
    student_id INT REFERENCES students(id),
    course_id  INT REFERENCES courses(id),
    PRIMARY KEY (student_id, course_id)
);

Use junction (bridge) tables for many-to-many relationships.

Step 9: Naming Conventions and Documentation

Consistent naming improves readability and maintainability:
- Use snake_case for all identifiers (e.g. order_date).
- Plural names for tables (employees, orders).
- Singular names for columns (employee_id).
- Prefix foreign keys with the table they reference (e.g. customer_id in orders).
Document your schema using comments:

COMMENT ON TABLE employees IS 'Stores employee data for the company';
COMMENT ON COLUMN employees.salary IS 'Monthly gross salary in euros';

Primary Keys in PostgreSQL

Introduction

A primary key is a column (or a group of columns) that uniquely identifies each row in a table.

It ensures that:
- No two rows have the same primary key value.
- The primary key value is never NULL.

PostgreSQL automatically creates a unique index for every primary key to speed up lookups and enforce uniqueness.

Basic Syntax

CREATE TABLE table_name (
    column_name data_type PRIMARY KEY,
    other_column data_type
);

Use the PRIMARY KEY constraint on a column to define it as the table's unique identifier.
PostgreSQL automatically makes that column NOT NULL and UNIQUE.

Example: Simple Primary Key

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name  TEXT NOT NULL,
    last_name   TEXT NOT NULL
);

SERIAL automatically creates an integer column with an internal sequence generator.
The primary key constraint guarantees that every employee_id value is unique and non-null.
Primary keys are often numeric because they are efficient for indexing and joining tables.

Using GENERATED AS IDENTITY (Modern SQL)

CREATE TABLE departments (
    dept_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name    TEXT NOT NULL
);

GENERATED ALWAYS AS IDENTITY is the SQL-standard replacement for SERIAL.
PostgreSQL manages the sequence automatically and ensures strict consistency.
You can also use GENERATED BY DEFAULT AS IDENTITY to allow manual insertion of custom IDs if necessary.

Composite (Multi-Column) Primary Keys

CREATE TABLE enrollments (
    student_id INT NOT NULL,
    course_id  INT NOT NULL,
    enrollment_date DATE DEFAULT CURRENT_DATE,
    PRIMARY KEY (student_id, course_id)
);

Composite keys use multiple columns to uniquely identify a record.
Here, the combination of student_id and course_id must be unique — but each one individually may repeat.
Common in many-to-many relationship tables (also called junction tables).

Adding a Primary Key to an Existing Table

ALTER TABLE employees
ADD CONSTRAINT employees_pkey PRIMARY KEY (employee_id);

If the table was created without a primary key, you can add one later using ALTER TABLE.
The constraint name (here employees_pkey) can be chosen manually or left for PostgreSQL to assign automatically.

Dropping a Primary Key

ALTER TABLE employees
DROP CONSTRAINT employees_pkey;

You can remove a primary key if you plan to redefine it or change your table structure.
However, be cautious — dropping a primary key may also affect FOREIGN KEY relationships referencing it.

Primary Key vs. Unique Constraint

Both ensure uniqueness, but they serve different purposes:

Feature	PRIMARY KEY	UNIQUE
NULL allowed?	No	Yes
Number per table	One only	Many allowed
Automatically indexed?	Yes	Yes
Purpose	Row identity	Prevent duplicate values

Viewing Primary Keys

-- Inside psql
\d employees

-- Query system catalog
SELECT
    tc.table_name,
    kc.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kc
    ON kc.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY';

\d table_name shows primary keys directly in the psql terminal.
You can also query the system catalog for programmatic inspection of constraints.

Behavior with Foreign Keys

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id)
);

Primary keys are the natural targets of FOREIGN KEY relationships.
They guarantee that every foreign key in another table refers to a unique, existing row.
If you delete a primary key row, PostgreSQL will enforce the foreign key's action (CASCADE, SET NULL, etc.).

Choosing a Good Primary Key

Follow these best practices:
- Keep it simple — use one column if possible.
- Use numeric or UUID types for efficiency.
- Never change primary key values after creation.
- Choose a value that is unique and stable across the table's lifetime.
Examples of good candidates:
- Auto-incrementing integers for internal identifiers.
- UUIDs for distributed or external systems.
- Composite keys for many-to-many link tables.

The NOT NULL Constraint in PostgreSQL

Introduction

The NOT NULL constraint ensures that a column cannot contain a NULL value.

In PostgreSQL, NULL represents "unknown" or "missing" data — not the same as an empty string or zero.

By declaring a column as NOT NULL, you guarantee that every row must have a valid (non-NULL) value in that column.

Basic Syntax

CREATE TABLE table_name (
    column_name data_type NOT NULL,
    ...
);

Attach NOT NULL right after the data type in the column definition.
If you try to insert a row without providing a value for that column, PostgreSQL will raise an error.

Example: Defining NOT NULL Columns

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name  TEXT NOT NULL,
    last_name   TEXT NOT NULL,
    email       TEXT,
    hire_date   DATE NOT NULL DEFAULT CURRENT_DATE
);

Here, first_name, last_name, and hire_date must always contain valid values.
email is optional — it can be left NULL.
If you omit a NOT NULL column in an INSERT statement, PostgreSQL will throw an error unless a DEFAULT value is provided.

Attempting to Insert NULL

INSERT INTO employees (first_name, last_name, hire_date)
VALUES (NULL, 'Smith', '2025-01-01');

This will fail with an error like:

ERROR:  null value in column "first_name" of relation "employees" violates not-null constraint
DETAIL: Failing row contains (null, Smith, 2025-01-01).

PostgreSQL prevents the insertion because first_name is defined as NOT NULL.

Adding a NOT NULL Constraint to an Existing Table

ALTER TABLE employees
ALTER COLUMN email SET NOT NULL;

You can add NOT NULL later using ALTER TABLE, but only if all existing rows already have non-NULL values in that column.
If some rows contain NULL, you must first update or delete them before applying the constraint.

Removing a NOT NULL Constraint

ALTER TABLE employees
ALTER COLUMN email DROP NOT NULL;

This makes the column nullable again, allowing future NULL values.
Dropping NOT NULL is safe and does not affect existing data.

Using DEFAULT with NOT NULL

CREATE TABLE tasks (
    task_id SERIAL PRIMARY KEY,
    description TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending'
);

Combining NOT NULL with a DEFAULT value ensures the column is always filled — even if the user doesn't explicitly provide a value.
Example insert:

INSERT INTO tasks (description) VALUES ('Review code');
-- status automatically becomes 'pending'

Checking Which Columns Are NOT NULL

SELECT
    column_name,
    is_nullable
FROM information_schema.columns
WHERE table_name = 'employees';

The is_nullable column will show NO for NOT NULL columns and YES for nullable ones.
Useful for verifying schema constraints in large databases.

NOT NULL vs. CHECK Constraints

You could technically write a CHECK (column IS NOT NULL) constraint, but NOT NULL is faster, simpler, and the standard method.
Both enforce the same rule, but NOT NULL is built into the column definition and optimized by PostgreSQL.

Behavior with Primary Keys

Primary key columns are automatically NOT NULL, you don't need to specify it explicitly.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
);

Even if you omit NOT NULL, the PRIMARY KEY constraint will enforce it internally.

The DEFAULT Constraint in PostgreSQL

Introduction

The DEFAULT constraint in PostgreSQL assigns an automatic value to a column when no value is specified during an INSERT operation.

It helps maintain data consistency and prevents NULL values in cases where a sensible fallback value exists.

Defaults can be constant values (like numbers or text), expressions, or even built-in PostgreSQL functions such as CURRENT_DATE or NOW().

Basic Syntax

CREATE TABLE table_name (
    column_name data_type DEFAULT default_value,
    ...
);

The DEFAULT keyword is placed after the data type when defining the column.
If you omit the column in an INSERT statement, PostgreSQL automatically fills it with the default value.

Example: Simple DEFAULT Values

CREATE TABLE tasks (
    task_id SERIAL PRIMARY KEY,
    description TEXT NOT NULL,
    status TEXT DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT NOW()
);

When a new task is inserted without specifying status or created_at, PostgreSQL automatically assigns:
- 'pending' to status.
- The current date and time (via NOW()) to created_at.

INSERT INTO tasks (description) VALUES ('Write documentation');
-- status     → 'pending'
-- created_at → current timestamp

Using DEFAULT with Numeric and Boolean Columns

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    price NUMERIC(10,2) DEFAULT 0.00,
    available BOOLEAN DEFAULT TRUE
);

If you omit price or available when inserting, PostgreSQL will assign 0.00 and TRUE respectively.
This avoids the need to manually fill common defaults like 0, TRUE, or 'N/A' for each row.

Using DEFAULT with Expressions

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    order_date DATE DEFAULT CURRENT_DATE,
    total NUMERIC(10,2) DEFAULT (0.0),
    order_ref TEXT DEFAULT ('ORD-' || EXTRACT(YEAR FROM CURRENT_DATE))
);

Defaults can be expressions or function results — not just constants.
Here:
- CURRENT_DATE fills in today's date automatically.
- ('ORD-' || EXTRACT(YEAR FROM CURRENT_DATE)) generates a dynamic prefix for each order (e.g., ORD-2025).

Inserting Explicit DEFAULT Values

INSERT INTO tasks (description, status)
VALUES ('Review code', DEFAULT);

Use the DEFAULT keyword explicitly in INSERT statements to apply the column's predefined default value.
This is especially useful when inserting into selected columns but still wanting defaults for others.

Adding a DEFAULT Constraint to an Existing Table

ALTER TABLE employees
ALTER COLUMN active SET DEFAULT TRUE;

You can add or modify default values using ALTER TABLE.
Defaults affect only future inserts — existing rows remain unchanged unless updated manually.

Removing a DEFAULT Constraint

ALTER TABLE employees
ALTER COLUMN active DROP DEFAULT;

After dropping the default, you must explicitly provide a value for that column when inserting new rows.
If the column also has NOT NULL, inserts without a value will raise an error.

Combining DEFAULT with NOT NULL

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username TEXT NOT NULL,
    active BOOLEAN NOT NULL DEFAULT TRUE
);

This combination ensures every user row always has a valid non-NULL value — either the provided one or the default TRUE.
It's one of the most common patterns for robust schema design.

Viewing Default Values

SELECT
    column_name,
    column_default
FROM information_schema.columns
WHERE table_name = 'tasks';

The column_default column displays the default expressions or constant values assigned to each field.
Useful for auditing and verifying schema configurations.

The CHECK Constraint in PostgreSQL

Introduction

The CHECK constraint ensures that data in a column or row meets a specified logical condition before it is inserted or updated.

It acts like a built-in validation rule inside the database, preventing invalid or inconsistent values from being stored.

When a row violates the condition defined by a CHECK constraint, PostgreSQL rejects the operation and raises an error.

Basic Syntax

CREATE TABLE table_name (
    column_name data_type CHECK (condition),
    ...
);

You can apply CHECK directly to a single column or define it as a table-level constraint involving multiple columns.
The condition must evaluate to TRUE or NULL (not FALSE), otherwise, the insert or update will fail.

Example: Simple CHECK Constraint

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name  TEXT NOT NULL,
    salary      NUMERIC(10,2) CHECK (salary >= 0)
);

This constraint guarantees that salary values must always be greater than or equal to zero.
Attempting to insert a negative salary will trigger an error.

ERROR:  new row for relation "employees" violates check constraint "employees_salary_check"
DETAIL: Failing row contains (-500).

Multiple CHECK Constraints

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name       TEXT NOT NULL,
    price      NUMERIC(10,2) CHECK (price >= 0),
    discount   NUMERIC(5,2) CHECK (discount BETWEEN 0 AND 100)
);

Each CHECK is validated independently.
PostgreSQL allows multiple CHECK constraints on the same table, each with its own rule.

Table-Level CHECK Constraint

CREATE TABLE accounts (
    account_id   SERIAL PRIMARY KEY,
    balance      NUMERIC(10,2),
    credit_limit NUMERIC(10,2),
    CHECK (balance <= credit_limit)
);

This constraint compares two columns in the same row.
It ensures that the balance never exceeds the credit_limit.
Unlike column-level checks, this form enforces a condition across multiple columns.

Adding a CHECK Constraint to an Existing Table

ALTER TABLE employees
ADD CONSTRAINT salary_positive CHECK (salary >= 0);

You can add new named CHECK constraints later with ALTER TABLE.
PostgreSQL immediately validates all existing rows — if any violate the rule, the command fails.

Dropping a CHECK Constraint

ALTER TABLE employees
DROP CONSTRAINT salary_positive;

To remove a CHECK constraint, you must reference it by name.
Use \d table_name in psql or query pg_constraint to see its name.

Using CHECK with Logical Operators

CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    age        INT CHECK (age >= 5 AND age <= 100),
    grade      TEXT CHECK (grade IN ('A', 'B', 'C', 'D', 'F'))
);

AND, OR, and IN can be combined inside the condition for complex logic.
This example:
- Ensures age stays within a reasonable range.
- Limits grade values to a predefined set of letters.

Using Functions in CHECK Constraints

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email   TEXT CHECK (POSITION('@' IN email) > 1)
);

You can call PostgreSQL functions or expressions within CHECK.
Here, POSITION('@' IN email) ensures that the email column contains an "@" symbol.
However, for complex validations, application-level or trigger-based logic may be preferable.

Deferrable CHECK Constraints

ALTER TABLE accounts
ADD CONSTRAINT balance_valid CHECK (balance <= credit_limit)
DEFERRABLE INITIALLY DEFERRED;

A DEFERRABLE constraint can be checked at the end of a transaction instead of at each statement.
This is useful when temporary violations occur during intermediate steps of a multi-step update.

Viewing CHECK Constraints

SELECT
    conname AS constraint_name,
    convalidated,
    condeferrable
FROM pg_constraint
WHERE contype = 'c' AND conrelid = 'employees'::regclass;

Use the pg_constraint system catalog to inspect existing CHECK constraints.
convalidated = true indicates that all rows currently satisfy the constraint.

CHECK vs. Application-Level Validation

CHECK constraints are enforced at the database level — they cannot be bypassed by client errors or inconsistent application code.
However:
- They should remain simple and deterministic.
- Complex, cross-table validations (e.g., involving subqueries) are better handled with triggers or application logic.

Foreign Keys in PostgreSQL

Introduction

A foreign key establishes a relationship between two tables by linking a column (or set of columns) in one table to the PRIMARY KEY or UNIQUE column of another table.

It enforces referential integrity — ensuring that a value in the referencing table must always match an existing value in the referenced table (or be NULL if allowed).

Foreign keys prevent "orphan" records, i.e., rows that refer to non-existent data in another table.

Basic Syntax

CREATE TABLE child_table (
    column_name data_type REFERENCES parent_table (parent_column)
);

The REFERENCES keyword defines the link to the target table and column.
By default, PostgreSQL enforces that the referenced value must exist in the parent table before it can be inserted into the child table.

Example: Basic Foreign Key Relationship

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name        TEXT NOT NULL
);

CREATE TABLE orders (
    order_id     SERIAL PRIMARY KEY,
    customer_id  INT REFERENCES customers(customer_id),
    order_date   DATE NOT NULL DEFAULT CURRENT_DATE
);

Here:
- customers is the parent table.
- orders is the child table — each order must refer to a valid customer.
If you try to insert an order for a non-existent customer, PostgreSQL will reject it.

ERROR:  insert or update on table "orders" violates foreign key constraint "orders_customer_id_fkey"
DETAIL: Key (customer_id)=(999) is not present in table "customers".

Foreign Key Constraint Naming

PostgreSQL automatically generates a name (e.g. orders_customer_id_fkey), but you can define your own for clarity:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    CONSTRAINT fk_orders_customer
        FOREIGN KEY (customer_id)
        REFERENCES customers (customer_id)
);

Explicit naming is recommended for maintainability, especially in large schemas.

Inserting and Deleting Data

INSERT INTO customers (name) VALUES ('Alice');
INSERT INTO orders (customer_id) VALUES (1);  -- valid

INSERT INTO orders (customer_id) VALUES (999); -- invalid

You can only insert a child row if the referenced parent row exists.
If you delete a parent row that is still referenced by children, PostgreSQL will block the operation unless an ON DELETE rule is specified.

ON DELETE and ON UPDATE Actions

PostgreSQL provides several options for handling parent changes:

Action	Description	Example
`NO ACTION` (default)	Rejects the change if it would break referential integrity.	`ON DELETE NO ACTION`
`RESTRICT`	Similar to `NO ACTION`, but checked immediately instead of at the end of the transaction.	`ON DELETE RESTRICT`
`CASCADE`	Deletes or updates dependent child rows automatically.	`ON DELETE CASCADE`
`SET NULL`	Sets the child's foreign key column to `NULL`.	`ON DELETE SET NULL`
`SET DEFAULT`	Sets the column to its default value (must be valid or NULL).	`ON DELETE SET DEFAULT`

Example:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id)
        ON DELETE CASCADE
        ON UPDATE CASCADE
);

Now if you delete a customer, all their orders are automatically deleted too.
If the customer's ID changes (rare but possible), the foreign key in orders updates automatically.

Using SET NULL and SET DEFAULT

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id)
        ON DELETE SET NULL
);

When the referenced customer is deleted, the foreign key in orders becomes NULL.
This requires the foreign key column to allow NULL values, otherwise, PostgreSQL will reject the setup.

Example with SET DEFAULT:

CREATE TABLE orders (
    order_id    SERIAL PRIMARY KEY,
    customer_id INT DEFAULT 0 REFERENCES customers(customer_id) ON DELETE SET DEFAULT
);

Here, when a customer is deleted, all their orders will have customer_id reset to 0 (a placeholder or "unassigned" value).

Composite Foreign Keys

CREATE TABLE enrollments (
    student_id INT,
    course_id  INT,
    PRIMARY KEY (student_id, course_id)
);

CREATE TABLE grades (
    student_id INT,
    course_id  INT,
    grade      CHAR(2),
    FOREIGN KEY (student_id, course_id)
        REFERENCES enrollments (student_id, course_id)
        ON DELETE CASCADE
);

Composite keys reference multiple columns together, maintaining multi-column relationships.
The number and order of columns in the FOREIGN KEY must exactly match those in the referenced key.

Deferrable Foreign Keys

ALTER TABLE orders
ADD CONSTRAINT fk_customer
    FOREIGN KEY (customer_id)
    REFERENCES customers (customer_id)
    DEFERRABLE INITIALLY DEFERRED;

Normally, PostgreSQL checks foreign key integrity immediately when you insert or update a row.
With DEFERRABLE INITIALLY DEFERRED, checks occur only when the transaction commits — allowing temporary inconsistencies within a transaction.
Example use case: cyclic relationships between tables that reference each other.

Checking Existing Foreign Keys

SELECT
    conname             AS constraint_name,
    conrelid::regclass  AS table_name,
    confrelid::regclass AS referenced_table,
    confupdtype         AS on_update,
    confdeltype         AS on_delete
FROM pg_constraint
WHERE contype = 'f';

The pg_constraint system catalog stores all foreign key definitions.
The confdeltype and confupdtype columns contain codes for delete/update actions:
- a — NO ACTION
- r — RESTRICT
- c — CASCADE
- n — SET NULL
- d — SET DEFAULT

Behavior When Dropping Referenced Tables

DROP TABLE customers;
-- ERROR:  cannot drop table customers because other objects depend on it

PostgreSQL blocks the drop to preserve referential integrity.
You can force the removal (and automatically drop all dependent constraints) using CASCADE:

DROP TABLE customers CASCADE;

This will also remove all foreign key constraints in child tables that reference it, but the child tables themselves remain intact.

NULL and Foreign Keys

Foreign key columns can contain NULL unless defined as NOT NULL.
When NULL is present, PostgreSQL skips foreign key validation for that row (because "unknown" cannot be compared).
This allows optional relationships — e.g., an order that may not yet be linked to a customer.

Performance Considerations

PostgreSQL automatically creates indexes on the referenced columns in the parent table (if they are primary or unique keys).
However, you may also create indexes on the foreign key columns in the child table for faster joins and deletions:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

This improves query performance when filtering or joining on the foreign key.

Basics on Joining Tables in PostgreSQL

Introduction

A JOIN combines rows from two or more tables based on a related column between them.

In PostgreSQL, joins are performed using the JOIN keyword together with ON or USING clauses that specify matching columns.

Basic JOIN Syntax

SELECT columns
FROM table1
JOIN table2
  ON table1.column_name = table2.column_name;

JOIN matches rows from both tables where the condition is true.
By default, JOIN means INNER JOIN.

Example Setup

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    city TEXT
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE DEFAULT CURRENT_DATE,
    amount NUMERIC(10,2)
);

INSERT INTO customers (name, city)
VALUES ('Alice', 'Berlin'),
       ('Bob', 'Munich'),
       ('Charlie', 'Hamburg');

INSERT INTO orders (customer_id, amount)
VALUES (1, 120.00),
       (1, 90.50),
       (2, 45.00);

This setup lets us explore all join types using customers and orders.

INNER JOIN

SELECT
    customers.name,
    orders.order_id,
    orders.amount
FROM customers
INNER JOIN orders
    ON customers.customer_id = orders.customer_id;

INNER JOIN returns only rows with matching values in both tables.
If a customer has no orders, that customer will not appear in the result.
Result:

 name   | order_id | amount
--------+-----------+--------
 Alice  |     1     | 120.00
 Alice  |     2     |  90.50
 Bob    |     3     |  45.00

2. LEFT JOIN (or LEFT OUTER JOIN)

SELECT
    customers.name,
    orders.order_id,
    orders.amount
FROM customers
LEFT JOIN orders
    ON customers.customer_id = orders.customer_id;

LEFT JOIN returns all rows from the left table (customers), and the matching rows from the right table (orders).
If no match exists, PostgreSQL fills right-side columns with NULL.
Result:

 name     | order_id | amount
----------+-----------+--------
 Alice    |     1     | 120.00
 Alice    |     2     |  90.50
 Bob      |     3     |  45.00
 Charlie  |   NULL    | NULL

RIGHT JOIN (or RIGHT OUTER JOIN)

SELECT
    customers.name,
    orders.order_id,
    orders.amount
FROM customers
RIGHT JOIN orders
    ON customers.customer_id = orders.customer_id;

RIGHT JOIN returns all rows from the right table (orders) and matching rows from the left table (customers).
Unmatched rows from the left table will appear as NULL.
Typically less used than LEFT JOIN because you can just swap the table order.

FULL JOIN (or FULL OUTER JOIN)

SELECT
    customers.name,
    orders.order_id,
    orders.amount
FROM customers
FULL JOIN orders
    ON customers.customer_id = orders.customer_id;

FULL JOIN returns all rows from both tables, matching them when possible.
Unmatched rows from either table get NULL values in the missing columns.
Result example if some rows had no counterpart:

 name     | order_id | amount
----------+-----------+--------
 Alice    |     1     | 120.00
 Alice    |     2     |  90.50
 Bob      |     3     |  45.00
 Charlie  |   NULL    | NULL
 NULL     |     4     |  80.00   -- order without a known customer

CROSS JOIN

SELECT
    customers.name,
    orders.order_id
FROM customers
CROSS JOIN orders;

CROSS JOIN produces the Cartesian product — every row from the first table is combined with every row from the second.
Use with caution — the number of rows grows rapidly (e.g., 10 × 10 = 100).
This join type does not require an ON condition.

SELF JOIN

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name TEXT,
    manager_id INT REFERENCES employees(employee_id)
);

SELECT e.name AS employee,
       m.name AS manager
FROM employees e
LEFT JOIN employees m
  ON e.manager_id = m.employee_id;

A self join joins a table to itself — useful for hierarchical or tree-like data such as employees and managers.
Aliases (e, m) are necessary to distinguish between the two instances of the same table.

USING Clause

SELECT *
FROM customers
JOIN orders USING (customer_id);

The USING (column_name) syntax is a shortcut when both tables share a column with the same name.
It automatically merges the duplicate join column in the output, so you'll see only one customer_id.

8. Joining More Than Two Tables

CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_name TEXT,
    quantity INT
);

SELECT
    c.name AS customer,
    o.order_id,
    i.product_name,
    i.quantity
FROM customers c
JOIN orders o
  ON c.customer_id = o.customer_id
JOIN order_items i
  ON o.order_id = i.order_id;

You can chain multiple JOIN clauses in one query.
Each JOIN connects one more table through a matching condition.

Filtering After Joins

SELECT
    c.name, o.amount
FROM customers c
JOIN orders o
  ON c.customer_id = o.customer_id
WHERE o.amount > 50;

You can use WHERE, ORDER BY, LIMIT, and other clauses after joining.
PostgreSQL first performs the join, then filters the results according to your conditions.

NATURAL JOIN

SELECT *
FROM customers
NATURAL JOIN orders;

NATURAL JOIN automatically joins tables using all columns with the same names.
While convenient, it can be dangerous — if new columns with the same name are added later, join behavior may change unexpectedly.
Use JOIN ... USING (...) instead for clarity and control.

Performance Considerations

Joins can be computationally expensive for large tables.
To optimize performance:
- Ensure both sides of join conditions are indexed (especially foreign key columns).
- Use EXPLAIN to analyze query plans and check for sequential scans.
- Limit unnecessary joins — fetch only required columns.

EXPLAIN SELECT * FROM customers JOIN orders ON customers.customer_id = orders.customer_id;

This shows how PostgreSQL executes the join internally (e.g., nested loop, hash join, merge join).

Set Operations in PostgreSQL

Introduction

Set operations in PostgreSQL allow you to combine the results of two or more SELECT queries into a single result set.

They work on complete rows (as sets), not on individual columns.

Common operations include:
- UNION – combine and remove duplicates.
- UNION ALL – combine and keep duplicates.
- INTERSECT – return only rows common to both queries.
- EXCEPT – return rows from the first query that are not in the second.

Each query in a set operation MUST:
- Select the same number of columns.
- Have compatible data types for corresponding columns.

Example Setup

CREATE TABLE students_2024 (
    name TEXT,
    city TEXT
);

CREATE TABLE students_2025 (
    name TEXT,
    city TEXT
);

INSERT INTO students_2024 VALUES
    ('Alice', 'Berlin'),
    ('Bob', 'Munich'),
    ('Charlie', 'Hamburg');

INSERT INTO students_2025 VALUES
    ('Alice', 'Berlin'),
    ('David', 'Stuttgart'),
    ('Ella', 'Munich');

We now have two tables with overlapping and distinct rows.

1. UNION — Combine and Remove Duplicates

SELECT * FROM students_2024
UNION
SELECT * FROM students_2025;

UNION merges the two result sets and automatically removes duplicates.
The combined result is sorted and distinct by default.
Result:

   name    |    city
------------+------------
 Alice      | Berlin
 Bob        | Munich
 Charlie    | Hamburg
 David      | Stuttgart
 Ella       | Munich

2. UNION ALL — Combine and Keep Duplicates

SELECT * FROM students_2024
UNION ALL
SELECT * FROM students_2025;

UNION ALL keeps duplicate rows, without sorting or removing them.
It is faster than UNION because PostgreSQL skips duplicate checking.
Result:

   name    |    city
------------+------------
 Alice      | Berlin
 Bob        | Munich
 Charlie    | Hamburg
 Alice      | Berlin
 David      | Stuttgart
 Ella       | Munich

3. INTERSECT — Common Rows Only

SELECT * FROM students_2024
INTERSECT
SELECT * FROM students_2025;

INTERSECT returns only the rows that appear in both queries.
Duplicates are automatically removed.
Result:

   name    |    city
------------+------------
 Alice      | Berlin

4. INTERSECT ALL — Common Rows, Keep Duplicates

SELECT * FROM students_2024
INTERSECT ALL
SELECT * FROM students_2025;

INTERSECT ALL keeps duplicates as they appear in both results.
If a row appears twice in both tables, it appears twice in the output.
Since our data has only one matching row, the result is identical to INTERSECT.

5. EXCEPT — Rows in First Query but Not in Second

SELECT * FROM students_2024
EXCEPT
SELECT * FROM students_2025;

EXCEPT returns all rows from the first query that do not appear in the second.
Duplicates are removed automatically.
Result:

   name    |    city
------------+------------
 Bob        | Munich
 Charlie    | Hamburg

6. EXCEPT ALL — Keep Duplicates

SELECT * FROM students_2024
EXCEPT ALL
SELECT * FROM students_2025;

EXCEPT ALL behaves like EXCEPT, but keeps duplicates according to their frequency in the first query minus the second.
Example: if a row appears 3 times in the first query and once in the second, it appears twice in the result.

7. Matching Columns and Types

-- This will fail:
SELECT name FROM students_2024
UNION
SELECT name, city FROM students_2025;

All participating queries in a set operation must have:
- The same number of columns.
- Compatible data types in corresponding positions.
Otherwise PostgreSQL will raise an error:

ERROR:  each UNION query must have the same number of columns

8. Sorting Combined Results

SELECT name, city FROM students_2024
UNION
SELECT name, city FROM students_2025
ORDER BY name;

Use ORDER BY only after the final set operation — not inside each SELECT.
PostgreSQL treats the combined output as one large table that can be sorted or filtered further.

9. Using Parentheses for Complex Set Operations

(SELECT * FROM students_2024
 UNION
 SELECT * FROM students_2025)
EXCEPT
SELECT * FROM expelled_students;

Parentheses control the evaluation order when mixing multiple set operators.
Without parentheses, PostgreSQL evaluates from left to right.
Use parentheses to make complex logic clear and unambiguous.

10. Combining with WHERE or JOIN Clauses

SELECT name, city FROM students_2024 WHERE city = 'Munich'
UNION
SELECT name, city FROM students_2025 WHERE city = 'Munich';

Each SELECT can include its own WHERE, JOIN, or GROUP BY clause before the set operation.
This allows you to filter and process each dataset independently before combining them.

11. Performance Considerations

UNION and INTERSECT remove duplicates — they require sorting or hashing internally, which can be expensive on large data sets.
UNION ALL and EXCEPT ALL skip this step and are generally faster.
When possible, use UNION ALL if you are sure duplicates are not an issue.

EXPLAIN SELECT * FROM students_2024 UNION SELECT * FROM students_2025;

EXPLAIN helps visualize how PostgreSQL performs the merge internally.

The BOOLEAN Type in PostgreSQL

Introduction

The BOOLEAN type in PostgreSQL represents truth values — logical true or false conditions.

It is often used in columns that represent yes/no, on/off, or active/inactive states.

Unlike databases that simulate booleans with integers (e.g. 0/1), PostgreSQL has a native boolean type with its own rules and accepted literals.

Declaration Syntax

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    active  BOOLEAN DEFAULT TRUE
);

BOOLEAN columns can take values TRUE, FALSE, or NULL.

Accepted Literal Values

PostgreSQL accepts a variety of textual and numeric representations for boolean values:

Meaning	Accepted Values
TRUE	`TRUE`, `'t'`, `'true'`, `'y'`, `'yes'`, `'on'`, `'1'`
FALSE	`FALSE`, `'f'`, `'false'`, `'n'`, `'no'`, `'off'`, `'0'`

The matching is case-insensitive, so 'True', 'YES', and 'On' all work the same.

Examples of Insertion

INSERT INTO users (name, active)
VALUES
    ('Alice', TRUE),
    ('Bob', 'yes'),
    ('Charlie', 'f'),
    ('Diana', 0);

SELECT * FROM users;

PostgreSQL automatically interprets 'yes', 'f', and 0 according to the truth table above.
Result:

 user_id |  name    | active
----------+----------+---------
 1        | Alice    | t
 2        | Bob      | t
 3        | Charlie  | f
 4        | Diana    | f

t and f are how PostgreSQL displays boolean values by default.

Boolean Display Formats

Although PostgreSQL stores booleans internally as true/false, by default it displays them as t and f.
This is a historical choice compatible with early UNIX and SQL standards.
You can cast or format them differently if needed:

SELECT active::text FROM users;
-- Outputs: 't', 'f'

SELECT
    CASE
        WHEN active THEN 'Active'
        ELSE 'Inactive'
    END
FROM users;
-- Custom labels for boolean output

Using Booleans in Conditions

SELECT * FROM users WHERE active = TRUE;

-- Shorter equivalent:
SELECT * FROM users WHERE active;

-- Opposite:
SELECT * FROM users WHERE NOT active;

You can write conditions with or without explicitly comparing to TRUE or FALSE.
The expression WHERE active is shorthand for WHERE active = TRUE.

NULL with Boolean Columns

INSERT INTO users (name, active) VALUES ('Eve', NULL);

SELECT name,
       CASE
         WHEN active THEN 'True'
         WHEN NOT active THEN 'False'
         ELSE 'Unknown'
       END AS status
FROM users;

Boolean columns can be NULL, representing "unknown" or "not specified."
This is different from FALSE — PostgreSQL treats NULL as neither true nor false.
The CASE statement (or COALESCE) can be used to handle NULL explicitly.

Boolean Operators

PostgreSQL supports standard logical operators on boolean expressions:

Operator	Description	Example	Result
`AND`	True if both operands are true	`TRUE AND FALSE`	FALSE
`OR`	True if at least one operand is true	`TRUE OR FALSE`	TRUE
`NOT`	Negates the boolean value	`NOT TRUE`	FALSE

Example usage inside a query:

SELECT * FROM users
WHERE active OR name = 'Alice';

Boolean Expressions in SELECT

SELECT name,
       (active AND city = 'Berlin') AS active_in_berlin
FROM users;

Boolean expressions can appear directly in the SELECT list — returning t, f, or NULL per row.
This is useful for logical flags or derived states.

Indexing Boolean Columns

Boolean columns typically have low selectivity (few distinct values), so simple indexes are often not useful.
However, you can create partial indexes to speed up queries on a specific boolean value:

CREATE INDEX idx_users_active_true
ON users (user_id)
WHERE active = TRUE;

This makes WHERE active = TRUE queries faster without indexing inactive users.

Converting Between Boolean and Other Types

SELECT CAST(TRUE AS INT);   -- returns 1
SELECT CAST(FALSE AS INT);  -- returns 0

SELECT CAST(1 AS BOOLEAN);  -- returns TRUE
SELECT CAST(0 AS BOOLEAN);  -- returns FALSE

You can cast between BOOLEAN and INTEGER types if needed.
PostgreSQL follows standard semantics: TRUE → 1, FALSE → 0.

Character Types: CHAR, VARCHAR, and TEXT in PostgreSQL

Introduction

PostgreSQL provides several types for storing textual data: CHAR, VARCHAR, and TEXT.

Although they appear similar, they differ in how they handle length, padding, and validation.

All three are stored efficiently as variable-length strings internally, but their behavior and use cases differ slightly.

CHAR(n) — Fixed-Length Character Type

CREATE TABLE products_char (
    code CHAR(5),
    name TEXT
);

INSERT INTO products_char VALUES ('AB', 'Widget');
SELECT code, LENGTH(code) FROM products_char;

CHAR(n) (or CHARACTER(n)) stores exactly n characters.
If you insert a shorter string, PostgreSQL pads it with spaces to the fixed length.
If you insert a longer string, it is truncated.
Result:

 code  | length
-------+--------
 AB    | 5

Notice how 'AB' was stored as 'AB ' (with padding).
Useful when strings must have uniform length (e.g., fixed codes, legacy systems).

VARCHAR(n) — Variable-Length with Limit

CREATE TABLE products_varchar (
    code VARCHAR(5),
    name TEXT
);

INSERT INTO products_varchar VALUES ('AB', 'Widget');
INSERT INTO products_varchar VALUES ('ABCDE', 'Gadget');

VARCHAR(n) (or CHARACTER VARYING(n)) stores a string upto n characters long.
It does not pad shorter strings with spaces.
If the input exceeds n, PostgreSQL raises an error:

ERROR:  value too long for type character varying(5)

This type is ideal when you need a length restriction (e.g., usernames ≤ 50 characters).

TEXT — Unlimited-Length Variable Text

CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT
);

INSERT INTO articles (title, content)
VALUES ('Intro to PostgreSQL', 'PostgreSQL is a powerful open-source RDBMS...');

TEXT stores strings of any length (up to about 1 GB per field).
It does not require you to specify a limit and does not pad or truncate values.
Functionally equivalent to VARCHAR without a length limit — same performance, same behavior.

Comparison Between CHAR, VARCHAR, and TEXT

Type	Behavior	Length Limit	Padded?	Use Case
`CHAR(n)`	Fixed-length	Yes	Yes (spaces)	Fixed-size codes, legacy data
`VARCHAR(n)`	Variable-length, validated	Yes	No	Text with known maximum length (e.g., names)
`TEXT`	Variable-length, unlimited	No	No	General-purpose text, documents, comments

CHAR vs VARCHAR vs TEXT in Practice

CREATE TABLE demo (
    c1 CHAR(5),
    c2 VARCHAR(5),
    c3 TEXT
);

INSERT INTO demo VALUES ('AB', 'AB', 'AB');
SELECT c1, c2, c3, LENGTH(c1), LENGTH(c2), LENGTH(c3) FROM demo;

Result:

 c1     | c2 | c3 | length
---------+----+----+--------
 AB     | AB | AB | 5  2  2

CHAR(5) pads to 5 characters, while VARCHAR and TEXT store only what you insert.

Performance and Storage

Internally, PostgreSQL stores CHAR, VARCHAR, and TEXT using the same variable-length storage format (varlena).

There is no performance difference between TEXT and VARCHAR(n) for most workloads.

The only difference lies in input validation and space padding behavior — not storage or speed.

Trimming and Padding

SELECT 'abc'::CHAR(5) = 'abc'::CHAR(5);  -- TRUE
SELECT 'abc'::CHAR(5) = 'abc';           -- TRUE (ignores padding)
SELECT LENGTH('abc'::CHAR(5));           -- 5

When comparing CHAR(n) values, PostgreSQL ignores trailing spaces.
This means 'AB ' and 'AB' compare as equal.
However, when concatenating or measuring string length, padding spaces remain visible.

Changing Length or Type

ALTER TABLE products_varchar
ALTER COLUMN code TYPE VARCHAR(10);

ALTER TABLE products_char
ALTER COLUMN code TYPE TEXT;

You can safely change between these text types if data fits within the new limits.
When increasing the VARCHAR length, PostgreSQL validates existing data automatically.

Function Compatibility

SELECT UPPER(code), LOWER(name)
FROM products_varchar;

SELECT LENGTH(content), LEFT(content, 20)
FROM articles;

All three types (CHAR, VARCHAR, TEXT) are fully compatible with PostgreSQL string functions and operators.
You can use UPPER(), LOWER(), CONCAT(), LENGTH(), TRIM(), etc., interchangeably.

Recommended Usage

PostgreSQL documentation itself recommends using TEXT for most textual data.
Guidelines:
- Use TEXT when you don't need a maximum length restriction.
- Use VARCHAR(n) if you want to enforce a limit (e.g., 50 characters for a username).
- Use CHAR(n) only when you must store fixed-length data (e.g., ISO codes, status flags).

Integer Types in PostgreSQL

Introduction

PostgreSQL provides several integer data types to represent whole numbers with different storage sizes and numeric ranges.

All integer types are signed (they support both positive and negative values) and stored efficiently in binary form.

Available Integer Types

Type	Alias	Storage Size	Range	Typical Usage
`SMALLINT`	`INT2`	2 bytes	-32,768 to 32,767	Very small numbers, compact storage
`INTEGER`	`INT` or `INT4`	4 bytes	-2,147,483,648 to 2,147,483,647	General-purpose integer type (most common)
`BIGINT`	`INT8`	8 bytes	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807	Very large values (IDs, counts, etc.)

All these types behave identically except for their size and range.

Basic Usage

CREATE TABLE inventory (
    item_id     INT PRIMARY KEY,
    quantity    SMALLINT,
    reorder_min INT,
    reorder_max BIGINT
);

INSERT INTO inventory VALUES (1, 25, 10, 5000000000);

PostgreSQL will automatically select appropriate storage and enforce the numeric range for each column.
If you try to insert a value beyond the range, it raises an error:

ERROR: integer out of range

Integer Aliases and Compatibility

INT and INTEGER are exact synonyms.
INT2, INT4, and INT8 are internal aliases for SMALLINT, INTEGER, and BIGINT.
You can use any version interchangeably — all mean the same data type.

Auto-Incrementing Integers

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name    TEXT NOT NULL
);

Internally, PostgreSQL expands this into:

user_id INTEGER NOT NULL DEFAULT nextval('users_user_id_seq'::regclass);

In modern PostgreSQL, the preferred syntax is the identity column:

CREATE TABLE users (
    user_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name TEXT
);

GENERATED ALWAYS AS IDENTITY is SQL-standard and avoids manually managing sequences.

Working with Sequences

-- Create a table with SERIAL
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    amount   NUMERIC(10,2)
);

-- Check the associated sequence
\d orders
-- Shows: "order_id" integer DEFAULT nextval('orders_order_id_seq'::regclass)

-- Get the next value manually:
SELECT nextval('orders_order_id_seq');

Each SERIAL or IDENTITY column uses a dedicated sequence object for generating new values.
You can control sequences directly using functions like nextval(), currval(), and setval().

Arithmetic and Overflow

SELECT 2147483647 + 1;        -- overflow for INTEGER
SELECT 9223372036854775807 + 1;  -- overflow for BIGINT

PostgreSQL detects overflows and raises an error — it does not wrap around silently.
To handle large numeric computations safely, use NUMERIC (arbitrary precision).

Type Casting

SELECT '42'::INT;      -- 42
SELECT 3.7::INT;       -- 4 truncated to 3
SELECT 3.7::BIGINT;    -- also truncates fractional part

PostgreSQL automatically casts compatible numeric and string literals to integer types when possible.
Fractions are truncated (not rounded).

Comparison and Sorting

SELECT * FROM inventory ORDER BY quantity DESC;

Integers can be sorted and compared normally with >, <, =, etc.
Indexes on integer columns are highly efficient, making them ideal for primary keys and lookup columns.

Converting Between Integer Types

SELECT 500::SMALLINT;   -- OK
SELECT 40000::SMALLINT; -- ERROR: smallint out of range

Conversions between integer types work as long as the value fits in the target range.
If it doesn't, PostgreSQL throws an overflow error.

Performance Tips

Use SMALLINT only when saving space truly matters, otherwise INTEGER is more convenient.
BIGINT should be used for IDs or counters that may exceed 2 billion (e.g., event logs, user IDs in large systems).
All integer types are fast to compare, sort, and index, since they are stored as fixed-size binary numbers.

The DECIMAL Type in PostgreSQL

Introduction

DECIMAL (or its synonym NUMERIC) is a fixed-point, exact numeric data type in PostgreSQL.

It stores numbers with user-defined precision and scale, ensuring no rounding errors in arithmetic operations.

This makes it ideal for use in financial, accounting, and scientific calculations where accuracy matters more than speed.

Declaration Syntax

NUMERIC(precision, scale)
-- or
DECIMAL(precision, scale)

precision: total number of significant digits (before + after decimal point).
scale: number of digits allowed after the decimal point.
For example: NUMERIC(8, 2) means:
- Up to 8 digits total.
- 2 digits after the decimal (so max 999999.99).
If you omit (p, s), PostgreSQL allows arbitrary precision — only limited by memory.

Example: Defining DECIMAL Columns

CREATE TABLE transactions (
    id       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    amount   DECIMAL(10, 2),
    tax_rate NUMERIC(5, 3),
    total    NUMERIC(12, 4)
);

INSERT INTO transactions (amount, tax_rate, total)
VALUES (1234.56, 0.075, 1326.9300);

DECIMAL(10, 2) — up to 10 digits total, with 2 after the decimal (e.g., 99999999.99).
NUMERIC(5, 3) — up to 5 digits total, 3 of them after the decimal (e.g., 12.345).
NUMERIC(12, 4) — higher precision for cumulative totals.

Precision and Rounding

CREATE TABLE test_round (
    value NUMERIC(6, 2)
);

INSERT INTO test_round VALUES (123.456);  -- Automatically rounded
SELECT * FROM test_round;

PostgreSQL rounds values to the defined scale when inserting or updating.
Result:

 value
--------
 123.46

If a number exceeds the allowed precision, PostgreSQL raises an error instead of truncating silently.

ERROR:  numeric field overflow
DETAIL: A field with precision 6, scale 2 must round to an absolute value less than 10^4.

Comparison with Floating-Point Types

SELECT
    0.1 + 0.2                                   AS float_sum,
    CAST(0.1 AS NUMERIC) + CAST(0.2 AS NUMERIC) AS numeric_sum;

Floating-point (REAL/DOUBLE) types use binary fractions and may produce tiny rounding errors:

 float_sum   | numeric_sum
-------------+--------------
 0.300000000 | 0.3

NUMERIC / DECIMAL performs exact arithmetic and never introduces such artifacts.
Therefore, prefer DECIMAL for money, interest rates, or measurements requiring exactness.

Default Precision and Scale

CREATE TABLE flexible_numbers (
    value NUMERIC
);

INSERT INTO flexible_numbers VALUES (12345678901234567890.123456789);
SELECT value FROM flexible_numbers;

If no precision and scale are specified, NUMERIC stores numbers with arbitrary length and precision.
This allows very large or very precise numbers, but may be slightly slower to process.
Output example:

 value
---------------------
 12345678901234567890.123456789

Arithmetic Operations

SELECT
    amount,
    tax_rate,
    amount * tax_rate AS tax,
    amount + (amount * tax_rate) AS total
FROM transactions;

NUMERIC values support all arithmetic operators: +, -, *, /.
PostgreSQL maintains full precision in intermediate steps and rounds only at output or storage.

Rounding Functions

SELECT ROUND(12.3456, 2),  -- 12.35
       TRUNC(12.3456, 2);  -- 12.34

ROUND() — rounds to the nearest value using standard half-up rounding.
TRUNC() — simply cuts off extra digits (no rounding).
Both functions work natively with NUMERIC values.

Formatting and Casting

SELECT TO_CHAR(1234.5::NUMERIC(10,2), '9999.99');  -- '1234.50'
SELECT TO_CHAR(9876.5432, '9,999.00');             -- '9,876.54'

TO_CHAR() converts NUMERIC values into formatted text with grouping, padding, or currency symbols.
You can also cast NUMERIC to INTEGER or DOUBLE PRECISION as needed.

SELECT CAST(12.34 AS INT);              -- 12
SELECT CAST(12.34 AS DOUBLE PRECISION); -- 12.34

Storage and Performance

NUMERIC values are stored in a variable-length binary format, optimized for precision rather than speed.
For small integers or approximate values, INTEGER or DOUBLE PRECISION are faster.
However, NUMERIC ensures exactness even for very large or fractional numbers.

Practical Use Cases

Finance: storing currency, exchange rates, and interest calculations (NUMERIC(12,2)).
Scientific data: precise measurements with fixed decimals (NUMERIC(8,4)).
Accounting: totals, taxes, and percentages where rounding errors are unacceptable.

Date and Time Types in PostgreSQL

Introduction

PostgreSQL supports a full suite of temporal data types for working with dates, times, and timestamps.

They follow the SQL standard closely and provide both timezone-aware and timezone-independent options.

Available Date and Time Types

Type	Description	Example Value
`DATE`	Stores calendar dates (year, month, day)	`'2025-11-11'`
`TIME [ (p) ]`	Stores time of day (no date, no timezone)	`'13:45:20'`
`TIME [ (p) ] WITH TIME ZONE`	Stores time of day including timezone offset	`'13:45:20+01'`
`TIMESTAMP [ (p) ]`	Stores date and time (no timezone)	`'2025-11-11 13:45:20'`
`TIMESTAMP [ (p) ] WITH TIME ZONE`	Stores date and time with timezone awareness	`'2025-11-11 13:45:20+01'`
`INTERVAL [ fields ] [ (p) ]`	Represents duration or difference between timestamps	`'3 days 04:05:06'`

(p) specifies optional fractional seconds precision (0 – 6 digits).
TIMESTAMP WITH TIME ZONE can also be written as TIMESTAMPTZ (shorthand alias).

DATE Type

CREATE TABLE events (
    id          GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    title       TEXT,
    event_date  DATE
);

INSERT INTO events (title, event_date)
VALUES
    ('Conference', '2025-12-01'),
    ('Workshop', CURRENT_DATE);

DATE stores only the year, month, and day.
CURRENT_DATE returns the system's current date.
Arithmetic is supported:

SELECT
    event_date + INTERVAL '7 days' AS next_week
FROM events;

TIME Type

CREATE TABLE shifts (
    employee    TEXT,
    start_time  TIME,
    end_time    TIME
);

INSERT INTO shifts
VALUES
    ('Alice', '09:00', '17:30'),
    ('Bob',   '22:00', '06:00');

TIME stores the hour, minute, and second (optionally with microseconds).
It is independent of any date or timezone.
You can compute time differences as intervals:

SELECT
    end_time - start_time AS duration
FROM shifts;

TIME WITH TIME ZONE (TIMETZ)

CREATE TABLE broadcasts (
    show_name TEXT,
    air_time  TIME WITH TIME ZONE
);

INSERT INTO broadcasts
VALUES
    ('Morning News', '07:00+01'),
    ('Night Talk', '23:00-05');

TIME WITH TIME ZONE (or TIMETZ) stores a local time and its associated timezone offset.
Rarely used in modern applications, since most real-world scheduling uses full timestamps instead.

TIMESTAMP (Without Time Zone)

CREATE TABLE logs (
    log_id      GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    message     TEXT,
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO logs (message) VALUES ('System started');

TIMESTAMP records both date and time but has no timezone awareness.
CURRENT_TIMESTAMP returns the local date and time when the row is inserted.
Useful when you want consistent "local" timestamps (e.g., system logs).

TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ)

CREATE TABLE meetings (
    meeting_id  GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    title       TEXT,
    start_at    TIMESTAMPTZ DEFAULT NOW()
);

INSERT INTO meetings (title, start_at) VALUES ('Team Sync', '2025-11-11 09:00:00+01');

TIMESTAMPTZ automatically normalizes values to UTC internally.
When displayed, PostgreSQL converts them to the current session timezone (SHOW TIMEZONE;).
This ensures consistent handling of international data and daylight saving changes.

SET TIMEZONE TO 'UTC';
SELECT start_at FROM meetings;

SET TIMEZONE TO 'America/New_York';
SELECT start_at FROM meetings;

The same instant in time displays differently under different timezones.

INTERVAL Type

CREATE TABLE rentals (
    rental_id  SERIAL PRIMARY KEY,
    start_date TIMESTAMP,
    duration   INTERVAL
);

INSERT INTO rentals VALUES (1, '2025-11-11 10:00:00', INTERVAL '3 days 4 hours');

SELECT
    start_date + duration AS return_date
FROM rentals;

INTERVAL represents a time duration or span, not a specific point in time.
You can add or subtract intervals to/from timestamps or dates.
Intervals can include multiple components (years, months, days, hours, minutes, seconds).

Extracting Date and Time Parts

SELECT
    EXTRACT(YEAR FROM start_at)   AS year,
    EXTRACT(MONTH FROM start_at)  AS month,
    EXTRACT(DAY FROM start_at)    AS day,
    EXTRACT(HOUR FROM start_at)   AS hour
FROM meetings;

EXTRACT() retrieves individual components from a date or timestamp.
Other functions like DATE_PART() and TO_CHAR() provide similar functionality with formatting control.

SELECT TO_CHAR(start_at, 'YYYY-MM-DD HH24:MI');

Date and Time Arithmetic

SELECT
    NOW()                           AS now,
    NOW() + INTERVAL '7 days'       AS next_week,
    NOW() - INTERVAL '1 month'      AS last_month,
    AGE('2025-12-31', '2025-01-01') AS difference;

+ and - can be used with intervals for addition and subtraction.
AGE() returns a human-readable interval difference between two timestamps.

Useful Built-in Functions

Function	Description
`NOW()`	Current date and time (same as `CURRENT_TIMESTAMP`)
`CURRENT_DATE`	Current date (no time)
`CURRENT_TIME`	Current time of day
`LOCALTIME` / `LOCALTIMESTAMP`	Local time/date-time without timezone conversion
`AGE(timestamp1, timestamp2)`	Difference between two timestamps as interval
`DATE_TRUNC('field', source)`	Truncates a timestamp to the specified precision (e.g., hour, day, month)

SELECT DATE_TRUNC('month', NOW()); -- first day of current month

Converting Between Date/Time Types

SELECT
    '2025-11-11'::DATE + INTERVAL '1 day',   -- date → timestamp
    '13:45:00'::TIME + INTERVAL '2 hours',   -- time + interval
    NOW()::DATE AS current_day,              -- timestamp → date
    CAST('2025-11-11 09:00' AS TIMESTAMPTZ);

The UUID Type in PostgreSQL

Introduction

UUID stands for Universally Unique Identifier.

It is a 128-bit value used to uniquely identify rows, records, or entities across systems, databases, and even time — without relying on a central sequence generator.

PostgreSQL natively supports UUID as a built-in data type, compliant with RFC 4122.

UUIDs are often used as primary keys in distributed or large-scale applications where integer IDs might collide or reveal sequence information.

UUID Format

A UUID is typically written as a 36-character string in hexadecimal form, separated by hyphens:

550e8400-e29b-41d4-a716-446655440000

Internally, PostgreSQL stores it efficiently as a 16-byte binary value (not as text).
UUIDs are case-insensitive, 'A23B...' and 'a23b...' represent the same value.

Creating a Table with UUID Column

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    name    TEXT NOT NULL
);

INSERT INTO users (user_id, name)
VALUES ('550e8400-e29b-41d4-a716-446655440000', 'Alice');

UUID columns can be manually assigned, or automatically generated using PostgreSQL's functions.
Each UUID guarantees uniqueness across space and time with an extremely low probability of collision.

Generating UUIDs Automatically

PostgreSQL provides UUID generation through the uuid-ossp and pgcrypto extensions.
To enable them:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pgcrypto";

Then you can generate UUIDs using built-in functions:

-- Using uuid-ossp:
SELECT uuid_generate_v1();   -- based on MAC + timestamp
SELECT uuid_generate_v4();   -- random-based (most common)

-- Using pgcrypto:
SELECT gen_random_uuid();    -- random version 4 UUID (faster, preferred)

Modern PostgreSQL versions (≥13) recommend gen_random_uuid() from pgcrypto for simplicity and performance.

Auto-Generated UUIDs as Primary Keys

CREATE TABLE api_tokens (
    token_id  UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    user_id   UUID NOT NULL,
    issued_at TIMESTAMPTZ DEFAULT NOW()
);

INSERT INTO api_tokens (user_id)
VALUES ('550e8400-e29b-41d4-a716-446655440000');

Here, gen_random_uuid() generates a new unique identifier automatically when no ID is provided.
This eliminates the need for numeric sequences and prevents ID collisions across distributed systems.
UUID-based keys are also non-sequential — they don't reveal record creation order or count.

UUID Versions Overview

Version	Source	Function (uuid-ossp)	Use Case
v1	Timestamp + MAC address	`uuid_generate_v1()`	Predictable, contains machine/time info (not recommended for public use)
v3	Namespace + MD5 hash	`uuid_generate_v3(namespace, name)`	Deterministic generation from names
v4	Random numbers	`uuid_generate_v4()` or `gen_random_uuid()`	Most common, fast, and secure for general use
v5	Namespace + SHA-1 hash	`uuid_generate_v5(namespace, name)`	Deterministic but safer than v3

Version 4 (random) UUIDs are the default choice for new projects.
Versions 3 and 5 are used when deterministic IDs are needed for given input data.

Comparing UUID vs SERIAL IDs

Aspect	`SERIAL / IDENTITY`	`UUID`
Uniqueness	Unique within a single database sequence	Globally unique across databases and servers
Predictability	Sequential, reveals insertion order	Non-sequential, unpredictable
Storage Size	4–8 bytes	16 bytes
Indexing Speed	Faster (smaller, sequential)	Slightly slower (larger, random order)
Replication / Merging	May collide between instances	Safe across distributed databases
Use Case	Simple local applications	Microservices, APIs, distributed systems

While UUID keys consume more storage and index space, their global uniqueness makes them ideal for modern scalable systems.

Querying UUID Columns

SELECT * FROM api_tokens
WHERE token_id = 'b36d6d3e-b41a-4d0c-b4e4-9985a93f79a7';

-- You can also use functions:
SELECT token_id::TEXT FROM api_tokens;

You can compare, cast, or index UUID columns just like integers.
UUID comparisons are efficient and type-safe — no need for string matching.

Indexing and Performance

CREATE INDEX idx_api_tokens_user_id ON api_tokens(user_id);

Indexing UUID columns works exactly like for integers.
However, because UUIDs are random, inserts can fragment B-tree indexes, slightly reducing performance for very large tables.
To optimize insert speed, consider using uuid_generate_v1mc() (MAC/time-ordered UUID) or PostgreSQL's pgcrypto functions with custom ordering strategies.

Common Table Expressions (CTE) in PostgreSQL

Introduction

A Common Table Expression (CTE) is a temporary, named result set defined within a query using the WITH clause.

CTEs make SQL queries easier to read, maintain, and debug, especially when dealing with subqueries or recursive data.

They exist ONLY for the duration of the SINGLE SQL statement and are not stored as permanent tables.

CTEs can be:
- Non-recursive: similar to a named subquery.
- Recursive: used to query hierarchical or tree-like data.

Basic Syntax

WITH cte_name (optional_columns) AS (
    SQL_query
)
SELECT ...
FROM cte_name;

cte_name — the name of your temporary result set.
optional_columns — optional list of column names for the CTE result.
SQL_query — a normal SELECT (or even INSERT / UPDATE / DELETE) that defines the data.

Example: Basic Non-Recursive CTE

WITH top_sellers AS (
    SELECT
        seller_id,
        SUM(amount) AS total_sales
    FROM orders
    GROUP BY seller_id
    HAVING SUM(amount) > 5000
)

SELECT
    s.seller_id,
    s.total_sales,
    u.name
FROM top_sellers AS s
JOIN users AS u ON u.id = s.seller_id;

The inner query creates a top_sellers CTE containing only sellers with total sales above 5000.
The main query then joins that result with users to retrieve seller details.
This improves readability compared to deeply nested subqueries.

Using Multiple CTEs

WITH
    orders_per_customer AS (
        SELECT customer_id, COUNT(*) AS order_count
        FROM orders
        GROUP BY customer_id
    ),
    top_customers AS (
        SELECT customer_id
        FROM orders_per_customer
        WHERE order_count > 10
    )

SELECT c.name, opc.order_count
FROM top_customers tc
JOIN orders_per_customer opc ON tc.customer_id = opc.customer_id
JOIN customers c ON c.id = tc.customer_id;

You can define multiple CTEs in a single WITH clause — each separated by commas.
Later CTEs can reference earlier ones within the same statement.

Modifying Data with CTEs

WITH updated AS (
    UPDATE products
    SET price = price * 1.10
    WHERE category = 'electronics'
    RETURNING id, price
)
SELECT * FROM updated;

CTEs can be used with INSERT, UPDATE, or DELETE statements that return data using RETURNING.
This pattern is powerful for performing modifications and immediately reading back affected rows.

Recursive CTEs

Recursive CTEs are used to handle hierarchical data such as organizational charts, category trees, or dependency chains.
The general form consists of two parts:
- Anchor member — the base query (starting rows).
- Recursive member — a query that references the CTE itself, used to fetch related rows iteratively.

WITH RECURSIVE subordinates AS (
    -- Anchor: start from the manager
    SELECT employee_id, name, manager_id
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: find employees reporting to those above
    SELECT e.employee_id, e.name, e.manager_id
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

This query builds an entire organization hierarchy starting from the top-level manager.
The recursion continues until no new rows are produced.
UNION ALL is used (instead of UNION) to include all recursive results efficiently.

Example: Folder Hierarchy

CREATE TABLE folders (
    id SERIAL PRIMARY KEY,
    name TEXT,
    parent_id INT REFERENCES folders(id)
);

WITH RECURSIVE folder_tree AS (
    SELECT id, name, parent_id, 1 AS depth
    FROM folders
    WHERE parent_id IS NULL

    UNION ALL

    SELECT f.id, f.name, f.parent_id, ft.depth + 1
    FROM folders f
    JOIN folder_tree ft ON f.parent_id = ft.id
)
SELECT * FROM folder_tree ORDER BY depth;

This recursive CTE prints all folders in a tree, showing their depth levels.
Very useful for hierarchical structures like menus, file systems, and organizational charts.

Performance Considerations

CTEs behave like optimization fences in older PostgreSQL versions (≤12):
- The CTE is fully materialized before the main query executes.
- This means it is computed once and stored temporarily, which may hurt performance for very large datasets.

In PostgreSQL ≥13, inlinable CTEs were introduced, non-recursive CTEs can now be automatically inlined by the optimizer if safe.

To enforce materialization (useful for debugging or when side-effects matter):

WITH MATERIALIZED temp_data AS (...)
SELECT * FROM temp_data;

Or, to explicitly allow inlining for better performance:

WITH NOT MATERIALIZED temp_data AS (...)
SELECT * FROM temp_data;

Views in PostgreSQL

Introduction

A VIEW in PostgreSQL is a virtual table defined by a SQL query.

It behaves like a table but does not store data physically, instead, it runs the underlying query each time the view is accessed.

Views simplify complex queries, improve code readability, and help enforce security by exposing only specific columns or rows.

PostgreSQL also supports materialized views, which store the result physically for faster reads.

Basic Syntax

CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;

view_name — the name of the virtual table.
The view can then be queried like any normal table using SELECT.

Example: Creating a Simple View

CREATE VIEW active_users AS
SELECT id, name, email
FROM users
WHERE is_active = true;

SELECT * FROM active_users;

This view returns only active users without requiring the filter in every query.
Views are dynamically updated, changes in underlying tables appear automatically.

Using Views to Simplify Complex Queries

CREATE VIEW monthly_sales AS
SELECT
    seller_id,
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount)                     AS total_sales
FROM orders
GROUP BY seller_id, month;

SELECT * FROM monthly_sales
WHERE total_sales > 5000;

Views reduce duplication of complex logic (aggregations, joins, calculations).
Applications can query the view instead of repeating long SQL statements.

Updating Data Through Views

Some views can be updatable, which means you can run INSERT, UPDATE, or DELETE on them.

A view is updatable when it maps directly to a single table without set operations or aggregates.

CREATE VIEW user_public_info AS
SELECT id, name, email
FROM users;

UPDATE user_public_info
SET name = 'Alice Cooper'
WHERE id = 5;

This modifies data in the underlying users table.
Views involving joins, GROUP BY, DISTINCT, UNION, etc., are usually not updatable.

Security and Access Control

Views are often used to restrict sensitive data.

You can grant access to the view while keeping the underlying tables private.

CREATE VIEW user_safe_view AS
SELECT id, name
FROM users;

GRANT SELECT ON user_safe_view TO analyst_role;
REVOKE SELECT ON users FROM analyst_role;

This ensures the role can only see the non-sensitive columns.

Replacing or Dropping a View

DROP VIEW IF EXISTS view_name;

CREATE OR REPLACE VIEW view_name AS
SELECT ...
FROM ...;

CREATE OR REPLACE lets you modify the view without dropping permissions.
Dropping a view does not affect the underlying data.

Materialized Views

A materialized view stores the query result physically on disk.

Useful for performance when the underlying query is heavy (joins, aggregations, scanning large tables).

But unlike normal views, materialized views must be manually refreshed.

CREATE MATERIALIZED VIEW sales_summary AS
SELECT seller_id, SUM(amount) AS total_sales
FROM orders
GROUP BY seller_id;

-- Read from the materialized view
SELECT * FROM sales_summary;

-- Refresh when data changes:
REFRESH MATERIALIZED VIEW sales_summary;

Materialized views behave like cached query results.
They significantly speed up analytics on large datasets.

Materialized View with Indexes

CREATE MATERIALIZED VIEW monthly_stats AS
SELECT
    DATE_TRUNC('month', created_at) AS month,
    COUNT(*)                        AS total_users
FROM users
GROUP BY month;

CREATE INDEX idx_monthly_stats_month ON monthly_stats(month);

Indexes on materialized views improve read performance just like on regular tables.

Advanced: Refreshing Concurrently

To refresh a materialized view without blocking reads, PostgreSQL provides:

REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary;

Requires a unique index on the materialized view.
Allows reads during the refresh — ideal for production systems.

Materialized Views in PostgreSQL

Introduction

A Materialized View (MV) is a database object that stores the physical result of a SQL query.

Unlike normal views (which run the underlying query every time), materialized views improve performance by caching the data.

Materialized views are ideal for heavy analytical queries, aggregations, and large datasets.

They must be manually refreshed to reflect changes in underlying tables.

PostgreSQL supports both normal refresh and CONCURRENT refresh, which avoids blocking reads.

Basic Syntax

CREATE MATERIALIZED VIEW view_name AS
SELECT ...
FROM ...
WHERE ...;

The materialized view stores the result of the SELECT at creation time.
The data inside the MV does NOT automatically update, so refresh is required.

Example: Creating a Materialized View

CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
    seller_id,
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount)                     AS total_sales
FROM orders
GROUP BY seller_id, month;

SELECT * FROM monthly_sales;

Query results are stored physically and retrieved instantly.
No recomputation happens until a refresh is triggered.

Refreshing Materialized Views

To update the stored data, call REFRESH MATERIALIZED VIEW:

REFRESH MATERIALIZED VIEW monthly_sales;

This command locks the MV for reads and writes until refresh is complete.
Use it when downtime during refresh is acceptable.

Refreshing Concurrently (Non-blocking)

To avoid blocking reads while refreshing:

REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;

Allows users to still query the MV during refresh.
Requires: A unique index on the MV.

CREATE UNIQUE INDEX idx_monthly_sales_unique ON monthly_sales(seller_id, month);

Use concurrent refresh for production systems where availability matters.

Materialized Views vs Normal Views

Feature	Normal View	Materialized View
Storage	No physical storage	Stores data physically
Performance	Runs query every time (slow for heavy logic)	Fast reads, cached data
Freshness	Always up-to-date	Must be refreshed manually
Use Case	Simple abstractions	Heavy queries, analytics
Blocking	No refresh needed	Refresh blocks reads unless CONCURRENTLY

Materialized Views with Indexes

CREATE MATERIALIZED VIEW user_summary AS
SELECT
    country,
    COUNT(*) AS user_count
FROM users
GROUP BY country;

CREATE INDEX idx_user_summary_country
ON user_summary(country);

Indexes on materialized views speed up SELECT queries.
Indexes persist across refresh operations.

Manually Refreshing Only When Needed

Materialized views are often refreshed:

Nightly (cron job)
Every hour for dashboard updates
On-demand when underlying tables change
After ETL/ELT batch loading

PostgreSQL does not auto-refresh — applications decide the refresh schedule.

Dropping a Materialized View

DROP MATERIALIZED VIEW IF EXISTS view_name;

Dropping an MV does not affect underlying tables.
Indexes on the MV are dropped automatically.

Advanced: Materialized Views with Joins

CREATE MATERIALIZED VIEW product_stats AS
SELECT
    p.id AS product_id,
    p.name,
    COUNT(o.id) AS orders_count,
    SUM(o.amount) AS total_revenue
FROM products p
LEFT JOIN orders o ON o.product_id = p.id
GROUP BY p.id, p.name;

-- Refresh when needed
REFRESH MATERIALIZED VIEW product_stats;

Materialized views shine when performing expensive joins repeatedly.

Advanced: Materialized View with WHERE Filters

CREATE MATERIALIZED VIEW recent_orders AS
SELECT *
FROM orders
WHERE order_date > NOW() - INTERVAL '30 days';

Useful for windowed datasets (recent data, daily snapshots, etc.).
Can be indexed for faster reporting.

The CASE Expression in PostgreSQL

Introduction

The CASE expression in PostgreSQL allows you to perform conditional logic directly inside SQL queries.

It works similarly to if-else statements in programming languages.

CASE returns different values depending on evaluated conditions.

General Syntax

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE default_result
END

WHEN clauses are evaluated from top to bottom.
The first true condition determines the returned value.
ELSE is optional; without it, missing matches return NULL.

Simple CASE Expression

Compares a single expression to several possible values.

CASE status
    WHEN 'A' THEN 'Active'
    WHEN 'I' THEN 'Inactive'
    ELSE 'Unknown'
END

Useful when conditions depend on equality comparisons.

Searched CASE Expression

Allows complex logical conditions.
Conditions are evaluated independently.

CASE
    WHEN amount > 1000 THEN 'High'
    WHEN amount BETWEEN 500 AND 1000 THEN 'Medium'
    WHEN amount > 0 THEN 'Low'
    ELSE 'None'
END

This form is more flexible and widely used.

Example: Categorizing Orders

SELECT
    order_id,
    amount,
    CASE
        WHEN amount > 1000 THEN 'VIP Order'
        WHEN amount > 100 THEN 'Standard Order'
        ELSE 'Small Order'
    END AS order_type
FROM orders;

This query adds a derived column order_type based on order amount.

Using CASE in ORDER BY

SELECT name, role
FROM employees
ORDER BY
    CASE
        WHEN role = 'manager'   THEN 1
        WHEN role = 'developer' THEN 2
        ELSE 3
    END;

CASE allows custom sorting based on business logic.
Here, managers appear first, then developers, then everyone else.

Using CASE in Aggregation

SELECT
    seller_id,
    SUM(CASE WHEN amount > 500 THEN 1 ELSE 0 END)  AS large_orders,
    SUM(CASE WHEN amount <= 500 THEN 1 ELSE 0 END) AS small_orders
FROM orders
GROUP BY seller_id;

Conditional aggregation is a powerful pattern in reporting and analytics.
Counts rows differently depending on CASE conditions.

Handling NULL with CASE

SELECT
    name,
    CASE
        WHEN phone IS NULL THEN 'No phone'
        ELSE phone
    END AS phone_display
FROM users;

CASE is often used as an alternative to COALESCE.

CASE with Multiple Conditions

SELECT
    product,
    stock,
    CASE
        WHEN stock = 0   THEN 'Out of Stock'
        WHEN stock <= 10 THEN 'Low Stock'
        WHEN stock <= 50 THEN 'In Stock'
        ELSE 'High Stock'
    END AS stock_level
FROM products;

Conditions are checked in order, so the order of WHEN clauses matters.

Using CASE to Avoid Division by Zero

SELECT
    seller_id,
    total_sales,
    total_orders,
    CASE
        WHEN total_orders = 0 THEN 0
        ELSE total_sales / total_orders
    END AS avg_order_value
FROM seller_stats;

CASE can gracefully guard against runtime errors.

Nested CASE Expressions

SELECT
    name,
    salary,
    CASE
        WHEN salary > 70000 THEN 'Senior'
        WHEN salary > 40000 THEN
            CASE
                WHEN salary > 55000 THEN 'Mid-High'
                ELSE 'Mid'
            END
        ELSE 'Junior'
    END AS level
FROM employees;

Nested CASE blocks allow complex conditional transformations.

CASE Expression Without ELSE

CASE
    WHEN score >= 90 THEN 'A'
    WHEN score >= 80 THEN 'B'
END

If no condition matches, result is NULL.
For safety, include ELSE in most production SQL.

Performance Considerations

The CASE expression is fast, it is evaluated per row and does not require special indexing.

However, CASE inside:
- ORDER BY
- WHERE
- JOIN conditions
can affect query performance because they prevent index-only operations.

Use CASE mainly in SELECT or for computed fields when possible.

The COALESCE Function in PostgreSQL

Introduction

COALESCE is a SQL function that returns the first non-NULL value in a list of expressions.

It is commonly used to handle NULL values, provide defaults, clean data, or avoid NULL-related errors.

PostgreSQL evaluates expressions from left to right and returns the earliest non-NULL result.

COALESCE is equivalent to a simple CASE expression but is shorter and more readable.

Syntax

COALESCE(value1, value2, value3, ...)

value1, value2, value3, ... are evaluated in order.
The first value that is not NULL is returned.
If all values are NULL, the result is NULL.

Basic Example

SELECT COALESCE(NULL, NULL, 'Hello', 'World');

Returns 'Hello', since it is the first non-NULL value.

Using COALESCE to Replace NULL With a Default

SELECT
    name,
    COALESCE(phone, 'No phone number') AS phone_display
FROM users;

If phone is NULL, it displays a default string instead.
Useful for UI, reports, and exported data.

Using COALESCE With Numeric Values

SELECT
    price,
    COALESCE(discount_price, price) AS effective_price
FROM products;

If discount_price is NULL, fallback to original price.
Prevents NULL from propagating into calculations.

Using COALESCE in Calculations

SELECT
    COALESCE(total_sales, 0) / COALESCE(order_count, 1) AS avg_order_value
FROM seller_stats;

Prevents division by NULL or NULL arithmetic results.
Often used in analytics pipelines.

COALESCE With Multiple Alternatives

SELECT
    COALESCE(username, email, phone, 'Unknown User') AS identifier
FROM users;

The first available non-NULL contact method is selected.
If all fields are NULL, 'Unknown User' is returned.

COALESCE vs CASE Expression

CASE
    WHEN value1 IS NOT NULL THEN value1
    WHEN value2 IS NOT NULL THEN value2
    ELSE value3
END

This CASE expression is equivalent to:

COALESCE(value1, value2, value3)

COALESCE is simpler and preferred for NULL-handling.

Using COALESCE With Aggregations

SELECT
    seller_id,
    COALESCE(SUM(amount), 0) AS total_sales
FROM orders
GROUP BY seller_id;

SUM() returns NULL when no rows exist.
COALESCE ensures numeric aggregates have meaningful defaults.

Handling NULL Dates

SELECT
    name,
    COALESCE(last_login, NOW()) AS effective_last_login
FROM users;

If a user never logged in, use the current timestamp.
Useful for audits and logs.

COALESCE in ORDER BY

SELECT title
FROM books
ORDER BY COALESCE(published_year, 0);

Allows ordering while gracefully handling NULL values.
NULL is usually treated as larger than all non-NULL values unless COALESCE changes it.

COALESCE in JOIN Conditions

SELECT *
FROM orders o
JOIN customers c
    ON COALESCE(o.customer_id, 0) = c.id;

Can be used to provide fallbacks in join logic.
Be careful: this may prevent index usage, lowering performance.

COALESCE vs NULLIF

COALESCE returns the first non-NULL value.

NULLIF(a, b) returns NULL if a = b, otherwise returns a.

SELECT
    COALESCE(NULLIF(status, ''), 'unknown') AS clean_status
FROM tasks;

NULLIF is often paired with COALESCE to replace "empty" values with defaults.

Performance Considerations

COALESCE is extremely fast and evaluated inline per row.

However, avoid COALESCE on indexed columns in JOIN/WHERE/ORDER BY if possible, because:
- It may disable index usage
- It forces PostgreSQL to evaluate the expression for every row

Using COALESCE in SELECT is always safe and efficient.

The NULLIF Function in PostgreSQL

Introduction

NULLIF is a conditional function in PostgreSQL that returns NULL if two values are equal.

If the values are not equal, it returns the first value unchanged.

NULLIF is often used to avoid division-by-zero errors, normalize data, or convert meaningless values into actual NULLs.

NULLIF is frequently used together with COALESCE for advanced NULL-handling logic.

Syntax

NULLIF(value1, value2)

If value1 = value2, the result is NULL.
If value1 ≠ value2, the result is value1.
value1 is always the return value unless the values match exactly.

Basic Example

SELECT NULLIF(10, 10);   -- returns NULL
SELECT NULLIF(10, 5);    -- returns 10

NULLIF only checks equality.
It does not support <, >, or complex expressions (you would use CASE for that).

Avoiding Division by Zero

The most common use of NULLIF is to prevent division-by-zero errors.

SELECT
    total_sales / NULLIF(total_orders, 0) AS avg_order_value
FROM seller_stats;

If total_orders = 0, NULLIF(total_orders, 0) becomes NULL.
Thus the result is NULL instead of raising a runtime error.
Often paired with COALESCE to provide a fallback:

SELECT
    COALESCE(total_sales / NULLIF(total_orders, 0), 0) AS avg_order_value
FROM seller_stats;

Here, zero is used instead of NULL for readability or reporting.

Cleaning Up Placeholder Values

Sometimes data uses placeholder values like '' or 'N/A' instead of NULL.
NULLIF can convert those placeholder values into true SQL NULLs.

SELECT
    NULLIF(phone, '') AS clean_phone
FROM users;

If phone = '', the result becomes NULL.

Another Example: Converting "unknown" Strings to NULL

SELECT
    NULLIF(status, 'unknown') AS clean_status
FROM tasks;

Useful when CSV imports or old systems store "unknown" as a literal string.

NULLIF with Numeric Data

SELECT
    price,
    discount,
    NULLIF(discount, 0) AS valid_discount
FROM products;

If discount = 0, treat it as NULL (e.g., "no discount").
Prevents misleading values in analytics dashboards.

NULLIF Combined with COALESCE

A common pattern is cleaning values with NULLIF, then providing a fallback using COALESCE.

SELECT
    COALESCE(NULLIF(username, ''), 'Guest') AS display_name
FROM accounts;

If username = '', it becomes NULL, and COALESCE replaces it with "Guest".
Powerful for cleaning and normalizing inconsistent data.

NULLIF in ORDER BY

SELECT name
FROM customers
ORDER BY NULLIF(priority, 0);

Zero priorities become NULL, which sort last unless modified.
Be cautious: using expressions in ORDER BY may prevent index use.

NULLIF Inside Aggregations

SELECT
    region,
    SUM(NULLIF(amount, 0)) AS total_revenue
FROM sales
GROUP BY region;

This ignores zero values inside the SUM (treats them as NULL).
Useful when zeros represent invalid or missing data.

NULLIF and Type Casting

SELECT NULLIF(value::INT, 0)
FROM data;

NULLIF works with all PostgreSQL data types, including casts.
If the casted value matches the comparison, it becomes NULL.

NULLIF with Multiple Comparisons (Using CASE Instead)

NULLIF only checks equality against a single value.

When you need multiple comparisons, use CASE:

CASE
    WHEN score IN (0, -1, -99) THEN NULL
    ELSE score
END

Useful when imported data uses multiple "missing value" codes.

Performance Considerations

NULLIF is extremely lightweight — it is evaluated once per row.

Be cautious using NULLIF inside:
- JOIN conditions
- WHERE clauses
- ORDER BY expressions
because it may prevent index-only scans.

NULLIF in SELECT lists has no performance drawback.

The LEAST and GREATEST Functions in PostgreSQL

Introduction

LEAST and GREATEST are comparison functions in PostgreSQL that return the minimum or maximum value from a list of expressions.

They evaluate all arguments and pick the smallest (LEAST) or largest (GREATEST) value.

Both functions accept two or more arguments and work with:
- Numbers
- Text
- Dates and timestamps
- Any comparable data type

Syntax

LEAST(value1, value2, value3, ...)
GREATEST(value1, value2, value3, ...)

At least two arguments are required.
All arguments must be of comparable data types.
If any argument is NULL, the result is NULL.

Basic Examples

SELECT LEAST(5, 8, 2, 7);       -- returns 2
SELECT GREATEST(5, 8, 2, 7);    -- returns 8

Arguments are compared using standard SQL comparison rules.

Using LEAST and GREATEST With Text

SELECT LEAST('banana', 'apple', 'cherry');    -- returns 'apple'
SELECT GREATEST('banana', 'apple', 'cherry'); -- returns 'cherry'

Text values are compared lexicographically (dictionary order).

Using LEAST and GREATEST With Dates

SELECT LEAST('2024-01-01', '2023-12-31');    -- 2023-12-31
SELECT GREATEST('2024-01-01', '2023-12-31'); -- 2024-01-01

Dates are compared chronologically.

Handling NULL Values

SELECT LEAST(10, NULL, 5);      -- NULL
SELECT GREATEST(NULL, 100, 50); -- NULL

If any argument is NULL, the whole function returns NULL.
Use COALESCE to avoid NULL propagation:

SELECT LEAST(COALESCE(a, 0), COALESCE(b, 0)) FROM table;

Clamping Values (Setting Min/Max Boundaries)

LEAST and GREATEST can be combined to restrict a value within a range.

-- Clamp score to range 0–100
SELECT LEAST(100, GREATEST(score, 0)) AS clamped_score
FROM results;

Values < 0 become 0, values > 100 become 100.
This is a common technique for input validation.

Using LEAST / GREATEST for Conditional Logic

These functions often replace simple CASE expressions.

SELECT
    salary,
    GREATEST(salary, 2000) AS minimum_salary
FROM employees;

If salary is less than 2000, it becomes 2000.

Choosing the Earliest or Latest Timestamp

SELECT
    LEAST(start_time, updated_at, created_at) AS earliest_timestamp,
    GREATEST(start_time, updated_at, created_at) AS latest_timestamp
FROM logs;

Useful when records track multiple timestamps.

Comparing Columns Across Rows

SELECT
    product,
    LEAST(price_usd, price_eur, price_gbp) AS minimum_price,
    GREATEST(price_usd, price_eur, price_gbp) AS maximum_price
FROM products;

Helps unify multiple currency columns or alternative pricing sources.

LEAST/GREATEST With Derived Expressions

SELECT
    id,
    LEAST(amount * 0.9, amount - 10) AS discounted_price
FROM orders;

You can freely mix arithmetic and functions within argument lists.

Using LEAST and GREATEST in ORDER BY

SELECT name
FROM employees
ORDER BY GREATEST(salary, bonus);

Sorting by "highest compensation component."
Be careful: expressions in ORDER BY may reduce index usage.

Performance Considerations

LEAST and GREATEST are extremely fast — evaluated per row, no special overhead.

However, using them in:
- WHERE clauses
- JOIN conditions
- ORDER BY expressions
may prevent index-only scans because PostgreSQL must compute values for every row.

In SELECT lists, they are completely safe and efficient.

Details on the LIKE Operator in PostgreSQL

Introduction

LIKE is a PostgreSQL operator for pattern-based text comparison using wildcards.

It checks whether a string matches a specified pattern using:
- % – matches zero or more characters
- _ – matches exactly one character

LIKE is case-sensitive. PostgreSQL provides ILIKE for case-insensitive matching.

Syntax

<expression> LIKE <pattern>
<expression> NOT LIKE <pattern>
<expression> ILIKE <pattern>
<expression> NOT ILIKE <pattern>

expression is the text to match.
pattern is a string literal containing wildcards.
Returns TRUE, FALSE, or NULL (when either side is NULL).

Wildcard Rules: % and _

-- % wildcard (zero or more characters)
SELECT 'postgresql' LIKE 'post%';   -- TRUE

-- _ wildcard (exactly one character)
SELECT 'cat' LIKE 'c_t';            -- TRUE

% matches an arbitrary-length sequence of characters (including empty).
_ matches exactly one UTF-8 character – important when matching multi-byte Unicode characters.

Advanced: Matching Multi-byte Unicode Characters

PostgreSQL treats characters, not bytes. This means:

SELECT '你' LIKE '_';    -- TRUE (one Chinese character)
SELECT '你' LIKE '__';   -- FALSE

_ never splits a multi-byte UTF-8 character.
This makes LIKE safe for CJK languages.

Escaping Wildcards: The ESCAPE Clause

To match literal % or _, define an escape character.

SELECT '100% match' LIKE '100\% match' ESCAPE '\';   -- TRUE
SELECT 'file_1'     LIKE 'file\_'      ESCAPE '\';   -- TRUE

The default escape character is none – you must explicitly use ESCAPE.
Any character can be the escape character:

SELECT 'file_1' LIKE 'file#_' ESCAPE '#';            -- TRUE

Pattern scanning follows these rules:
- <escape>% → literal percent
- <escape>_ → literal underscore
- <escape><escape> → literal escape

Trailing Spaces and LIKE

PostgreSQL does not ignore trailing spaces in LIKE patterns.

SELECT 'abc ' LIKE 'abc';     -- FALSE
SELECT 'abc ' LIKE 'abc%';    -- TRUE

Use RTRIM() when comparing padded strings.

Locale and Collation Impact

LIKE follows SQL-standard collation rules.
Collation affects character comparison but not wildcard behavior.

SELECT 'äpfel' LIKE 'a%';    -- depends on collation

With German collation, ä may compare equal to a.
With C collation, only byte comparison is used.

LIKE vs ILIKE (Case-Insensitive Patterns)

SELECT 'Hello' LIKE  'h%';   -- FALSE
SELECT 'Hello' ILIKE 'h%';   -- TRUE

ILIKE uses PostgreSQL's Unicode-aware case folding.
Useful for languages with case variants (German: ß → ss behavior, Turkish dotted/dotless i, etc.).

LIKE and NULL

SELECT NULL LIKE '%';    -- NULL
SELECT 'abc' LIKE NULL;  -- NULL

If either operand is NULL, the result is NULL (unknown).
Use COALESCE() when needed.

Using LIKE With ANY / ALL

SELECT *
FROM logs
WHERE message LIKE ANY (ARRAY['%error%', '%fail%', '%fatal%']);

LIKE vs SIMILAR TO vs Regular Expressions

LIKE – Simple wildcard patterns (fast, minimal syntax)
SIMILAR TO – SQL-standard regex-like operator (inconsistent, rarely used)
~, ~* – POSIX regular expressions (powerful)

-- LIKE (simple)
WHERE name LIKE 'A%';

-- SIMILAR TO (regex with SQL syntax)
WHERE name SIMILAR TO '(A|B)%';

-- POSIX regex (preferred for complex patterns)
WHERE name ~ '^(A|B).*';

Prefer LIKE for prefix searches.
Use regex for complex logic.
Avoid SIMILAR TO unless required for SQL compliance.

Index Usage Rules

B-tree indexes are used only for left-anchored patterns:

-- Index used
WHERE name LIKE 'abc%';

Not used for:

WHERE name LIKE '%abc';
WHERE name LIKE '%abc%';
WHERE name LIKE '_bc%';             -- cannot use index
WHERE LOWER(name) LIKE 'abc%';      -- no index unless functional index exists

ILIKE never uses B-tree indexes unless pg_trgm is installed.

Using pg_trgm for Fast LIKE Searches

Install extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Add trigram index:

CREATE INDEX idx_users_name_trgm
ON users USING gin (name gin_trgm_ops);

This speeds up:

LIKE '%pattern%'
ILIKE '%pattern%'
Regex operations

Performance Considerations

LIKE 'abc%' is very fast with indexes.
LIKE '%abc' is slow, because it requires full scan.
Always avoid:
- Leading wildcards
- Patterns that require scanning all rows
- Wrapping columns with LOWER() unless using functional index

Use pg_trgm for large text search workloads.

Practical Examples

-- Emails from Gmail
SELECT * FROM users WHERE email LIKE '%@gmail.com';

-- Names containing digits
SELECT * FROM users WHERE name LIKE '%_%' ESCAPE '\\';

-- Case-insensitive search
SELECT * FROM cities WHERE name ILIKE 'ber%';

-- Titles that start with a letter A–F
SELECT * FROM books WHERE title ~ '^[A-F]';

PostgreSQL Extensions

Introduction

A PostgreSQL extension is a package that adds extra functionality to the database server.

Extensions can contain:
- SQL functions
- Data types
- Indexes and operator classes
- Views, triggers, cast definitions, etc.

They are managed with commands like CREATE EXTENSION, ALTER EXTENSION, and DROP EXTENSION.

Typical examples: pg_trgm, hstore, uuid-ossp, citext, postgres_fdw.

Concept: Installed vs. Available Extensions

Available extension: its files are present on the server (in the PostgreSQL installation), but not yet enabled in the current database.

Installed (or "created") extension: registered in the current database via CREATE EXTENSION.

You can list all available extension packages with:

SELECT * FROM pg_available_extensions;

You can list all extensions currently installed in your database with:

SELECT * FROM pg_extension;

Creating an Extension (Enabling It in a Database)

CREATE EXTENSION pg_trgm;

This makes the pg_trgm functionality available in the current database.

Extensions are installed per database, not globally, so if you have multiple databases, you must run CREATE EXTENSION in each database where you need it.

You can specify a schema to hold the extension objects:

CREATE EXTENSION pg_trgm WITH SCHEMA extensions;

This is useful to keep extension objects separated from your own application schema.

Listing Extensions and Their Versions

-- Installed in current database
SELECT
    extname,
    extversion,
    extnamespace::regnamespace AS schema_name
FROM
    pg_extension
ORDER BY
    extname;

-- Available (but maybe not installed)
SELECT
    name,
    default_version,
    installed_version
FROM
    pg_available_extensions
ORDER BY
    name;

default_version is the version shipped with the server.
installed_version is NULL for extensions not yet created in the database.

Extension Files and Control Files (High-Level)

Each extension is described by:
- A .control file – metadata: name, default version, required settings.
- One or more .sql files – contain the SQL commands to create the extension objects.

These files live in the PostgreSQL installation directories (typically share/extension).

Database users normally do not touch these files directly, they just use CREATE EXTENSION.

Extension Schemas and search_path

When you run CREATE EXTENSION without WITH SCHEMA, the extension is created in the first schema on your search_path that is not a system schema.

To avoid cluttering public, many teams create a dedicated schema:

CREATE SCHEMA IF NOT EXISTS extensions;

CREATE EXTENSION IF NOT EXISTS hstore
    WITH SCHEMA extensions;

When calling extension functions, you can:
- Add the extension schema to search_path, or
- Use fully qualified names, e.g. extensions.some_extension_function(...).

Updating Extensions (ALTER EXTENSION)

-- Update a single extension to latest version
ALTER EXTENSION pg_trgm UPDATE;

-- Update to a specific version (if supported)
ALTER EXTENSION some_ext UPDATE TO '1.2';

Removing Extensions (DROP EXTENSION)

DROP EXTENSION pg_trgm;

This removes all objects created by the extension from the current database.

If you have objects that depend on the extension (e.g., indexes using gin_trgm_ops), you might see dependency errors.

You can use CASCADE (careful!) to drop dependent objects:

DROP EXTENSION pg_trgm CASCADE;

Always verify what will be dropped before using CASCADE in production.

Common Built-In Extensions

pg_trgm – trigram indexes and similarity search, often used with LIKE / ILIKE.

hstore – key-value text storage in a single column (before JSONB was popular).

uuid-ossp – functions to generate UUIDs (uuid_generate_v4(), etc.).

citext – case-insensitive text type (e.g., for case-insensitive emails).

btree_gin, btree_gist – extra index operator classes.

postgres_fdw – foreign data wrapper for connecting to other PostgreSQL servers.

Extensions Requiring shared_preload_libraries

Some extensions hook deeper into the server and must be loaded at startup via shared_preload_libraries in postgresql.conf.

Examples:
- pg_stat_statements (query statistics)
- Some connection poolers, tracking or sampling modules

Typical flow:

-- In postgresql.conf (example)
shared_preload_libraries = 'pg_stat_statements'

-- After server restart, in each database where you want it:
CREATE EXTENSION pg_stat_statements;

You need file-level access to the server to change postgresql.conf, this is usually a DBA or admin task.

Permissions and Security

Creating or dropping an extension requires superuser or database owner privileges (depending on configuration).

Extensions can:
- Run server-side C code
- Access files and system resources
so installing untrusted third-party extensions can be a security risk.

Only install extensions that come from trusted sources or your platform provider.

Extensions and Backups

When you use pg_dump to export a database with extensions, the dump does not include the extension SQL source code.

Instead, the dump contains a CREATE EXTENSION command, assuming the extension is available on the target server.

Therefore:
- Make sure the same extensions (and versions) are installed on the restore server.
- Otherwise, restoring the dump may fail.

Checking Dependencies Between Extensions and Objects

To see which objects belong to an extension:

SELECT
    objid::regclass,
    classid::regclass
FROM pg_depend d
JOIN pg_extension e ON d.refobjid = e.oid
WHERE e.extname = 'pg_trgm';

This is advanced usage, mostly for DBAs who need to audit dependencies.

Practical Workflow Example

Example: enabling pg_trgm for fuzzy search on users(name):

-- 1. Create a dedicated schema for extensions (optional)
CREATE SCHEMA IF NOT EXISTS extensions;

-- 2. Install the pg_trgm extension into that schema
CREATE EXTENSION IF NOT EXISTS pg_trgm
WITH SCHEMA extensions;

-- 3. Create a trigram index for LIKE / ILIKE searches
CREATE INDEX idx_users_name_trgm
ON users
USING gin (name extensions.gin_trgm_ops);

-- 4. Now queries like this become much faster:
SELECT *
FROM users
WHERE name ILIKE '%wang%';

Creating Your Own PostgreSQL Extensions

Introduction

You can create your own PostgreSQL extensions to package functions, types, operators, and other database objects into a reusable module.

Extensions can be:
- SQL-only – written purely in SQL and procedural languages (PL/pgSQL, PL/Perl, etc.).
- C-based – contain compiled C code in a shared library (.so, .dll).

Once packaged as an extension, users only need CREATE EXTENSION my_ext; instead of running long SQL scripts manually.

High-Level Steps

Create a control file (my_ext.control) describing the extension.

Create one or more SQL script files (my_ext--1.0.sql, my_ext--1.0--1.1.sql, etc.).

(Optional) Build a shared library in C (my_ext.so) if you need low-level functions.

Install these files into PostgreSQL's share/extension (and lib for C code).

Enable the extension in a database using CREATE EXTENSION.

Directory Locations

To find where extension files should be placed, use pg_config on the server:

pg_config --sharedir   # usually contains share/extension
pg_config --pkglibdir  # usually contains shared libraries (.so/.dll)

SQL scripts and .control files go into $sharedir/extension.

Shared libraries go into $pkglibdir.

On managed services (cloud providers), you usually cannot install your own C extensions, but you can still create SQL-only extensions if they allow it.

Creating a Simple SQL-only Extension: Example

Suppose we want an extension hello_ext that adds a function hello(name).

-- File: hello_ext--1.0.sql
CREATE OR REPLACE FUNCTION hello(name text)
RETURNS text
LANGUAGE sql
AS
$$
    SELECT 'Hello, ' || name || '!'::text;
$$;

This SQL script defines the objects that will be created when the extension is installed.

The file name must follow the pattern <extname>--<version>.sql, here hello_ext--1.0.sql.

Writing the Control File (.control)

Create a file named hello_ext.control in share/extension:

# File: hello_ext.control
comment = 'Simple hello extension example'
default_version = '1.0'
relocatable = true
schema = public
requires = ''

Fields:

comment – short description of the extension.
default_version – version used when you run CREATE EXTENSION hello_ext;.
relocatable – if true, the extension can be moved to another schema using ALTER EXTENSION ... SET SCHEMA.
schema – default schema where objects will be installed (often public).
requires – comma-separated list of other extensions that must be installed first (or empty string).

Installing the Files

Copy the files to the appropriate directory on the server:

cp hello_ext.control   $(pg_config --sharedir)/extension/
cp hello_ext--1.0.sql  $(pg_config --sharedir)/extension/

After this, PostgreSQL knows that an extension named hello_ext is available.

Creating the Extension in a Database

Connect to the database where you want to use the extension:

CREATE EXTENSION hello_ext;

This runs hello_ext--1.0.sql and registers the extension in pg_extension.

SELECT hello('Hwangfucius');
-- Result: 'Hello, Hwangfucius!'

If you want the extension objects in a specific schema:

CREATE SCHEMA IF NOT EXISTS extensions;
CREATE EXTENSION hello_ext WITH SCHEMA extensions;

Versioning and Upgrade Scripts

Suppose we want to add a new function and bump the extension to version 1.1.

Steps:

Update the control file's default_version to '1.1'.
Add a new SQL script hello_ext--1.0--1.1.sql with the migration.

# File: hello_ext.control (updated)
comment = 'Simple hello extension example'
default_version = '1.1'
relocatable = true
schema = public
requires = ''

-- File: hello_ext--1.0--1.1.sql
-- Migration from 1.0 to 1.1

CREATE OR REPLACE FUNCTION hello_upper(name text)
RETURNS text
LANGUAGE sql
AS
$$
    SELECT upper(hello(name));
$$;

To upgrade an existing installation:

ALTER EXTENSION hello_ext UPDATE TO '1.1';

PostgreSQL runs hello_ext--1.0--1.1.sql automatically when upgrading from version 1.0.

You can chain multiple upgrade scripts:
- hello_ext--1.0--1.1.sql
- hello_ext--1.1--1.2.sql
- ...

C-based Extensions (Overview Only)

For performance-sensitive logic or low-level features, you can write C functions and expose them as an extension.

Typical structure:

hello_ext/
  hello_ext.c
  hello_ext.control
  hello_ext--1.0.sql
  Makefile

The C file defines server functions using PostgreSQL's C API:

/* hello_ext.c (simplified example) */
#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(hello_c);

Datum
hello_c(PG_FUNCTION_ARGS)
{
    text *name = PG_GETARG_TEXT_PP(0);
    text *prefix = cstring_to_text("Hello, ");
    text *result = catenate_text(prefix, name); /* pseudo helper */
    PG_RETURN_TEXT_P(result);
}

The SQL file registers the function:

-- hello_ext--1.0.sql
CREATE FUNCTION hello_c(name text)
RETURNS text
AS 'hello_ext', 'hello_c'
LANGUAGE c
STRICT;

You build and install the module using make and make install (with PostgreSQL's extension build system).

This requires PostgreSQL server headers and build tools, and is typically done by DBAs or server admins.

Relocatable vs Non-Relocatable Extensions

Relocatable extensions (relocatable = true) can be moved to a different schema:

ALTER EXTENSION hello_ext SET SCHEMA extensions;

Non-relocatable extensions set relocatable = false and often specify a fixed schema using the schema parameter.

Non-relocatable is safer if your SQL scripts include hard-coded schema-qualified names.

Dependencies Between Extensions

Use requires in the control file to depend on other extensions:

# File: my_ext.control
comment = 'My extension using pg_trgm'
default_version = '1.0'
relocatable = true
requires = 'pg_trgm'

PostgreSQL ensures the required extensions are installed first.

Testing and Debugging Your Extension

Use a dedicated test database:

CREATE DATABASE ext_test;
\c ext_test

CREATE EXTENSION hello_ext;
SELECT hello('Test');

If something goes wrong, you can:
- Drop and recreate the test database.
- Adjust your SQL or control files, reinstall, and test again.

For C-based extensions, server logs are essential for debugging crashes or errors.

Regular Expressions in PostgreSQL

Introduction

Regex in PostgreSQL is much more powerful than LIKE or SIMILAR TO, supporting character classes, anchors, quantifiers, alternation, and grouping.

PostgreSQL uses advanced (extended) POSIX regex, which is similar to Perl-style regex but not identical.

Regex operators can be used in WHERE, SELECT, CHECK constraints, and more.

Regex Matching Operators

Operator	Description	Case Sensitivity
`~`	String matches regular expression	Sensitive
`~*`	String matches regular expression	Insensitive
`!~`	String does NOT match regular expression	Sensitive
`!~*`	String does NOT match regular expression	Insensitive

SELECT 'Hello123' ~ '^[A-Za-z]+$';   -- FALSE
SELECT 'Hello'    ~ '^[A-Za-z]+$';   -- TRUE
SELECT 'hello'    ~* 'HELLO';        -- TRUE (case-insensitive)

Basic Regex Syntax

. – any single character
* – zero or more occurrences
+ – one or more occurrences
? – zero or one
| – alternation
() – grouping

SELECT 'abc123' ~ '^[a-z]+[0-9]+$';  -- TRUE
SELECT 'cat'    ~ 'c(at|ut)';         -- TRUE
SELECT 'cut'    ~ 'c(at|ut)';         -- TRUE

Character Classes

[abc] – match 'a' or 'b' or 'c'
[^abc] – any character except a/b/c
[0-9] – any digit
[A-Za-z] – any letter

SELECT '5'   ~ '^[0-9]$';       -- TRUE
SELECT 'g'   ~ '^[^0-9]$';      -- TRUE
SELECT 'ü'   ~ '^[A-Za-z]$';    -- FALSE (ASCII only!)

Important: POSIX classes with ASCII ranges do not match non-ASCII letters.

POSIX Character Classes (Locale-Aware)

Class	Description
`[:alpha:]`	Letters (locale aware)
`[:digit:]`	Digits
`[:alnum:]`	Letters and digits
`[:space:]`	Whitespace
`[:lower:]`	Lowercase letters
`[:upper:]`	Uppercase letters

SELECT 'ä' ~ '^[[:alpha:]]+$';   -- TRUE (depends on locale)

POSIX classes are much more Unicode-friendly than [A-Za-z].

Anchors

^ – start of string
$ – end of string
\b – word boundary (POSIX subset)
\< – start of word
\> – end of word

SELECT 'postgresql' ~ '^post';     -- TRUE
SELECT 'postgresql' ~ 'sql$';      -- TRUE
SELECT 'cat dog'    ~ '\bdog\b';   -- TRUE

Quantifiers

a* – zero or more
a+ – one or more
a? – zero or one
a{3} – exactly 3
a{2,5} – between 2 and 5

SELECT 'hellooo' ~ 'o{2,}';   -- TRUE
SELECT 'ho'      ~ 'o{2,}';   -- FALSE

Grouping and Alternation

SELECT 'gray'  ~ 'gr(a|e)y';  -- TRUE
SELECT 'grey'  ~ 'gr(a|e)y';  -- TRUE

Parentheses group expressions.
| allows alternatives.

Using Regex in WHERE Clauses

SELECT *
FROM users
WHERE email ~ '^[a-z0-9._%+-]+@example\.com$';

Use ~* for case-insensitive search:

WHERE name ~* '^(junzhe|wang)$';

Regex in SELECT (search inside strings)

SELECT
    regexp_matches('abc123xyz', '[0-9]+');

Returns matching substring(s) as an array.

Regex Functions and Operators

Function	Description
`regexp_matches()`	Return all matches as text arrays
`regexp_replace()`	Find & replace using regex
`regexp_split_to_array()`	Split text using regex
`regexp_split_to_table()`	Split into table rows

SELECT regexp_replace('abc123', '[0-9]', '#', 'g');
-- Result: 'abc###'

Escaping Characters in Regex

Inside regex, metacharacters like ., [ ], +, *, ?, | must be escaped with \.

Inside SQL strings, \ itself may need escaping depending on settings!

SELECT '3+4' ~ '3\+4';     -- match literal +
SELECT 'a.b' ~ 'a\.b';     -- match literal dot

If standard_conforming_strings = off, you must double the backslashes.

Dollar-Quoted Strings to Avoid Escaping

To avoid complex escaping issues, use dollar-quoted literals:

SELECT '3+4' ~ $re$3\+4$re$;

This is the preferred method for regex-heavy code.

Extracting Multiple Matches

SELECT regexp_matches(
    'a1 b2 c3',
    '([a-z])([0-9])',
    'g'
);

Flag 'g' returns all match groups.

Performance Considerations

Regex is CPU-intensive.

Regex cannot use B-tree indexes (except for fixed-prefix anchors ^text).

For fast substring / fuzzy search, use pg_trgm extension:

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_users_name_trgm
    ON users USING gin (name gin_trgm_ops);

SELECT * FROM users WHERE name ~ 'wang';

Trigram indexes accelerate many regex operations automatically.

Regex vs LIKE vs SIMILAR TO

Feature	LIKE	SIMILAR TO	Regex (~)
Wildcard power	Low	Medium	High
Performs complex validation	No	Somewhat	Yes
Index support	Only prefix	No	No (except pg_trgm)
Syntax complexity	Very easy	Medium	High

Regex is the most powerful but most expensive.

The SIMILAR TO Operator in PostgreSQL

Introduction

SIMILAR TO is a SQL-standard pattern matching operator that combines aspects of LIKE and regular expressions.

It is more powerful than LIKE but less powerful (and less flexible) than PostgreSQL's regex operators (~, ~*).

SIMILAR TO uses a regex-like syntax but still follows SQL-standard rules, not POSIX rules.

It is rarely used in real-world PostgreSQL development because regex operators are more intuitive and consistent.

Basic Syntax

value SIMILAR TO pattern
value NOT SIMILAR TO pattern

Returns TRUE if value matches the SQL-standard regex-like pattern.
Returns FALSE otherwise.
Returns NULL if either operand is NULL.

Fundamental Rules of SIMILAR TO

% – matches zero or more characters (like LIKE).
_ – matches exactly one character.
| – regex alternation operator.
() – grouping for alternation.
[] – character classes.
ESCAPE clause can be used to escape metacharacters.

SELECT 'cat' SIMILAR TO '(cat|dog)';   -- TRUE
SELECT 'dog' SIMILAR TO '(cat|dog)';   -- TRUE
SELECT 'frog' SIMILAR TO '(cat|dog)';  -- FALSE

SIMILAR TO vs LIKE

Feature	LIKE	SIMILAR TO
Alternation	No	Yes (`\|`)
Grouping	No	Yes
Character classes	No	Yes
Complex validation	No	Possible
Case sensitivity	Sensitive	Sensitive only (no ILIKE equivalent)

In short: SIMILAR TO = LIKE + basic regex.

Alternation (|) and Grouping

SELECT 'apple'  SIMILAR TO '(apple|banana|cherry)';  -- TRUE
SELECT 'guava'  SIMILAR TO '(apple|banana|cherry)';  -- FALSE

Grouping with parentheses is required for alternation.

Character Classes

SELECT 'a' SIMILAR TO '[abc]';     -- TRUE
SELECT 'd' SIMILAR TO '[abc]';     -- FALSE
SELECT '5' SIMILAR TO '[0-9]';     -- TRUE

Similar to regex classes, but with SQL-standard semantics.

Wildcard Behavior

% = zero or more characters.
_ = exactly one character.

SELECT 'hello' SIMILAR TO 'h%';    -- TRUE
SELECT 'hello' SIMILAR TO 'h_llo'; -- TRUE

Anchoring Rules (Implicit Anchors)

Unlike POSIX regex, SIMILAR TO patterns act like the entire string is implicitly wrapped with ^ ... $.

Meaning: SIMILAR TO always matches the entire string. Partial matches are not allowed unless you use %.

SELECT 'hello' SIMILAR TO 'ell';    -- FALSE (must match whole string)
SELECT 'hello' SIMILAR TO '%ell%';  -- TRUE

Complex Validation Example

SELECT '2024-10-15' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}';  -- TRUE
SELECT '2024/10/15' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}';  -- FALSE

SIMILAR TO supports repetition with curly braces too.

Using SIMILAR TO With NOT

SELECT 'abc' NOT SIMILAR TO '[0-9]+';   -- TRUE

Behaves similarly to NOT LIKE and !~.

Escaping Special Characters

Special characters that require escaping:
- %, _ (as in LIKE)
- ( )
- |
- [ ]
- { }

SELECT 'a+b' SIMILAR TO 'a\+b';  -- match literal +

You can specify a custom escape character:

SELECT 'a+b' SIMILAR TO 'a#+' ESCAPE '#';

SIMILAR TO is Fully Anchored

Unlike regular expressions, SIMILAR TO matches the entire string unless % or _ is used.

SELECT 'abc'   SIMILAR TO 'a.*';    -- FALSE
SELECT 'abc'   SIMILAR TO 'a%';     -- TRUE

This difference surprises many users!

Combining Conditions

SELECT *
FROM logs
WHERE message SIMILAR TO '%(ERROR|WARN|FATAL)%';

Good for basic multi-pattern filtering.

SIMILAR TO vs Regex Operators (~)

Feature	SIMILAR TO	Regex (~)
Power level	Medium	High
Performance	Medium	Lower (heavier)
Anchored matching	Always full string	Optional
Unicode support	Limited	Full POSIX regex support
Syntax	SQL-standard	POSIX regex syntax
Alternation	Yes	Yes
Complex patterns	Limited	Very strong

Regex (~) is almost always preferred for non-trivial pattern matching.

Performance Considerations

SIMILAR TO does not use indexes unless the pattern starts with a fixed prefix.

Like regex and LIKE with wildcards, it can be slow on large text columns.

To optimize substring search, use pg_trgm:

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_logs_msg_trgm
ON logs USING gin (message gin_trgm_ops);

SELECT * FROM logs
WHERE message SIMILAR TO '%fail%';

Trigram indexes dramatically improve performance.

Practical Examples

-- Validate YYYY-MM-DD
SELECT '2024-12-31' SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}';

-- Match emails ending in .com or .net
SELECT 'test@example.com' SIMILAR TO '%@(.*)\.(com|net)';

-- Validate integer or decimal
SELECT '123.45' SIMILAR TO '[0-9]+(\.[0-9]+)?';

-- Match strings starting with A–F
SELECT 'Apple' SIMILAR TO '[A-F]%';

Use SIMILAR TO for SQL-standard pattern matching where full regex is not required.

Basics on Full-Text Search (FTS) in PostgreSQL

Introduction

PostgreSQL provides a powerful, built-in full-text search (FTS) system for indexing and searching natural language text.

FTS understands linguistic structures:
- tokenization
- stemming
- stop-words
- ranking of results

It works with special data types such as tsvector and tsquery.

FTS is more advanced than LIKE, ILIKE, or regex because it understands word forms and relevance.

Core Concepts

tsvector – preprocessed document, normalized into lexemes.
tsquery – a structured query for searching tsvector documents.
@@ – match operator between tsvector and tsquery.
@@@ – phrase search (PostgreSQL ≥ 15).

SELECT 'a fat cat sat on a mat'::tsvector;
-- 'a' 'cat' 'fat' 'mat' 'on' 'sat'

SELECT to_tsvector('English', 'The cars are running');
-- 'car':2 'run':4

tsvector: The Document

A tsvector is the indexed representation of a text string.

It contains:
- lexemes (normalized words)
- positions
- weights (optional)

SELECT to_tsvector('English', 'Programming program programs programmed');
-- 'programm':1,3,4,5

PostgreSQL reduces all of these to the lexeme 'programm' via stemming.

tsquery: The Search Expression

A tsquery represents what you are searching for.

Main operators:
- & – AND
- | – OR
- ! – NOT
- <-> – phrase (word adjacency)

SELECT to_tsquery('english', 'cat & dog');
-- 'cat' & 'dog'

SELECT to_tsquery('english', 'car | truck');
-- 'car' | 'truck'

Basic Searching with @@ Operator

SELECT
    to_tsvector('english', 'The big car crashed'),
    to_tsquery('english', 'car');

To check if text matches a query:

SELECT
    to_tsvector('english', 'The big car crashed')
        @@ to_tsquery('english', 'car');     -- TRUE

Phrase Search (word adjacency)

SELECT to_tsquery('english', 'big <-> car');

big <-> car means "big immediately before car".

SELECT to_tsvector('The big car crashed') @@
       to_tsquery('big <-> car');     -- TRUE

Phrase Search (exact phrase) in PostgreSQL ≥ 15

New phrase search operator @@@:

SELECT to_tsvector('The big red car') @@@ 'big red car';
-- TRUE

Equivalent to a multi-word adjacency chain.

Indexing Full-Text Search

Use a GIN index for fast text search:

CREATE INDEX idx_docs_fts
ON documents
USING gin (to_tsvector('english', content));

But then queries must repeat the same expression:

SELECT *
FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'car');

Storing tsvector in a Column

You can store tsvector in a dedicated column and update it automatically:

ALTER TABLE documents
ADD COLUMN fts tsvector
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

CREATE INDEX idx_docs_fts ON documents USING gin (fts);

This avoids recomputing the vector on every query.

Composing Queries with plainto_tsquery and websearch_to_tsquery

plainto_tsquery – converts plain text into a query using AND logic.
phraseto_tsquery – phrase interpreting version.
websearch_to_tsquery – parses Google-style syntax:
- quoted phrases
- -exclude
- OR
- implicit AND

SELECT websearch_to_tsquery('english', '"big car" -red');
-- 'big' <-> 'car' & !'red'

Ranking Search Results

Use ts_rank or ts_rank_cd to compute relevance:

SELECT
    id,
    ts_rank(fts, to_tsquery('english', 'car')) AS rank
FROM documents
WHERE fts @@ to_tsquery('english', 'car')
ORDER BY rank DESC;

Common: higher weights for title fields vs body text.

SELECT setweight(to_tsvector(title), 'A') ||
       setweight(to_tsvector(body),  'D')
       AS fts;

Lexeme Weights

You can assign weights A, B, C, D:

SELECT setweight(to_tsvector('Title text'), 'A') ||
       setweight(to_tsvector('Body text'),  'D');

Useful when ranking large documents.

Stop Words

Stop words are common words like the, is, a, etc.

They are removed during to_tsvector processing.

SELECT to_tsvector('english', 'the the the cat');
-- 'cat':4

Different dictionaries have different stop-word lists.

Highlighting Matches (ts_headline)

SELECT ts_headline(
    'english',
    'The big car crashed near the big house',
    to_tsquery('big & car')
);

Highlights matched words using <b> tags by default.

Using Custom Dictionaries

PostgreSQL FTS supports:
- simple dictionary
- snowball stemmer
- thesaurus
- stop-word files

Thesaurus example:

# thesaurus file example:
supercar : car, sports car, racecar

Can be integrated with to_tsvector pipeline.

Practical Example: Search Blog Posts

ALTER TABLE blog_posts
ADD COLUMN fts tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', title), 'A') ||
        setweight(to_tsvector('english', body),  'D')
    ) STORED;

CREATE INDEX idx_blog_fts ON blog_posts USING gin (fts);

SELECT
    id,
    ts_headline(body, q) AS preview,
    ts_rank(fts, q) AS rank
FROM blog_posts,
     to_tsquery('english', 'postgres & search') q
WHERE fts @@ q
ORDER BY rank DESC;

User-Defined Functions (UDFs) in PostgreSQL

Introduction

PostgreSQL allows you to create your own functions using SQL, PL/pgSQL, PL/Python, PL/Perl, PL/Java, and even C.

User-defined functions (UDFs) are first-class citizens: they can be used in SELECT, JOIN, WHERE, triggers, indexes, and extensions.

Functions can return:
- scalar values (INT, TEXT, JSON, etc.)
- composite types (ROW)
- TABLE results (set-returning)
- SETOF anytype
- void

PostgreSQL optimizes SQL functions aggressively and often inlines them, making them very fast.

Basic SQL Functions

SQL functions are simple wrappers around SQL expressions.

CREATE OR REPLACE FUNCTION add_numbers(a INT, b INT)
RETURNS INT
LANGUAGE sql
AS
$$
    SELECT a + b;
$$;

Executed as:

SELECT add_numbers(10, 20);   -- 30

Functions Returning NULL, STRICT Functions

STRICT means the function returns NULL immediately if any argument is NULL.

Equivalent to manually checking for NULL.

CREATE FUNCTION multiply(a INT, b INT)
RETURNS INT
LANGUAGE sql
STRICT
AS
$$
    SELECT a * b;
$$;

Now:

SELECT multiply(NULL, 5);  -- NULL

IMMUTABLE, STABLE, VOLATILE

These declarations tell PostgreSQL how predictable your function is:

Category	Description	Examples
`IMMUTABLE`	Always returns the same output for the same input	pure math, string functions
`STABLE`	Same result within a single query, but may change between queries	now(), current_setting()
`VOLATILE`	Can change anytime, cannot be optimized	random(), sequences, updates

CREATE FUNCTION greet(name TEXT)
RETURNS TEXT
LANGUAGE sql
IMMUTABLE
AS
$$
    SELECT 'Hello ' || name;
$$;

PL/pgSQL Functions (Procedural Language)

PL/pgSQL is PostgreSQL's powerful procedural language: loops, variables, conditions, exceptions.

CREATE OR REPLACE FUNCTION factorial(n INT)
RETURNS INT
LANGUAGE plpgsql
AS
$$
DECLARE
    result INT := 1;
BEGIN
    IF n < 0 THEN
        RAISE EXCEPTION 'Negative numbers not allowed';
    END IF;

    FOR i IN 1..n LOOP
        result := result * i;
    END LOOP;

    RETURN result;
END;
$$;

SELECT factorial(5);  -- 120

Using Variables, BEGIN...END Blocks

CREATE FUNCTION format_user(first TEXT, last TEXT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
DECLARE
    full TEXT;
BEGIN
    full := initcap(first || ' ' || last);
    RETURN full;
END;
$$;

Control Structures (IF, CASE, LOOP)

CREATE FUNCTION classify(score INT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
BEGIN
    IF score >= 90 THEN
        RETURN 'A';
    ELSIF score >= 80 THEN
        RETURN 'B';
    ELSE
        RETURN 'C';
    END IF;
END;
$$;

RETURN QUERY for Set-Returning Functions

You can return multiple rows (like a table).

CREATE FUNCTION get_users_with_gmail()
RETURNS TABLE(id INT, email TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
    RETURN QUERY
    SELECT id, email FROM users WHERE email LIKE '%@gmail.com';
END;
$$;

SELECT * FROM get_users_with_gmail();

Returning Custom Composite Types

CREATE TYPE user_info AS (
    id INT,
    name TEXT,
    created TIMESTAMP
);

CREATE FUNCTION get_user(uid INT)
RETURNS user_info
LANGUAGE sql
AS
$$
    SELECT id, name, created_at
    FROM users
    WHERE id = uid;
$$;

Functions Returning JSON or JSONB

CREATE FUNCTION product_json(pid INT)
RETURNS JSONB
LANGUAGE sql
AS
$$
    SELECT jsonb_build_object(
        'id', id,
        'name', name,
        'price', price
    )
    FROM products
    WHERE id = pid;
$$;

Security Definer Functions

SECURITY DEFINER makes the function run with the privileges of the creator, not the caller.
Useful for controlled data access.
Dangerous if misused – always restrict the search_path!

CREATE FUNCTION read_logs()
RETURNS SETOF logs
LANGUAGE sql
SECURITY DEFINER
SET search_path = public
AS
$$
    SELECT * FROM logs;
$$;

Managing search_path Safely

To avoid SQL injection in SECURITY DEFINER functions, always set search_path explicitly:

ALTER FUNCTION read_logs() SET search_path = public;

Error Handling and Exception Blocks

CREATE OR REPLACE FUNCTION safe_div(a NUMERIC, b NUMERIC)
RETURNS NUMERIC
LANGUAGE plpgsql
AS
$$
BEGIN
    RETURN a / b;
EXCEPTION
    WHEN division_by_zero THEN
        RETURN NULL;
END;
$$;

RAISE Statements

RAISE NOTICE 'User %, score %', name, score;
RAISE WARNING 'Low disk space';
RAISE EXCEPTION 'Invalid input: %', value;

Overloading Functions

PostgreSQL supports function overloading based on argument types.

CREATE FUNCTION info(x INT) RETURNS TEXT AS
$$ SELECT 'int' $$ LANGUAGE sql;
CREATE FUNCTION info(x TEXT) RETURNS TEXT AS
$$ SELECT 'text' $$ LANGUAGE sql;

SELECT info(42);      -- 'int'
SELECT info('foo');   -- 'text'

C-Language Functions (Overview)

C functions provide maximum performance and flexibility.
They are compiled into .so libraries and loaded by PostgreSQL.
Used for advanced extensions.

#include "postgres.h"
#include "fmgr.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(add_one);

Datum add_one(PG_FUNCTION_ARGS)
{
    int32 arg = PG_GETARG_INT32(0);
    PG_RETURN_INT32(arg + 1);
}

The SQL wrapper:

CREATE FUNCTION add_one(int)
RETURNS int
AS 'myext', 'add_one'
LANGUAGE c STRICT;

Dropping and Replacing Functions

DROP FUNCTION add_numbers(INT, INT);

If signature changes, you must drop or use OR REPLACE.

Listing All User-Defined Functions

SELECT
    proname,
    pg_get_functiondef(p.oid)
FROM pg_proc p
JOIN pg_namespace n ON p.pronamespace = n.oid
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema');

Practical Real-World UDF Examples

-- Normalize email
CREATE FUNCTION clean_email(e TEXT)
RETURNS TEXT
IMMUTABLE
LANGUAGE sql
AS
$$ SELECT lower(trim(e)) $$;

-- Generate URL slug
CREATE FUNCTION slugify(s TEXT)
RETURNS TEXT
LANGUAGE plpgsql
AS
$$
BEGIN
    RETURN regexp_replace(lower(s), '[^a-z0-9]+', '-', 'g');
END;
$$;

-- Increase stock safely
CREATE FUNCTION add_stock(pid INT, amount INT)
RETURNS VOID
LANGUAGE plpgsql
AS
$$
BEGIN
    UPDATE products
    SET stock = stock + amount
    WHERE id = pid;
END;
$$;

Stored Procedures

Introduction

PostgreSQL distinguishes between functions and stored procedures.

Procedures were introduced in PostgreSQL 11 and allow:
- transaction control inside procedure body (COMMIT, ROLLBACK),
- autonomous units of work,
- calls via CALL instead of SELECT.

Procedures do not return values (no RETURNS clause).

They are ideal for batch jobs, maintenance tasks, cleanup routines, ETL steps, or anything requiring transaction boundaries.

Basic Syntax

CREATE PROCEDURE procedure_name(arg1 type1, arg2 type2)
LANGUAGE plpgsql
AS
$$
BEGIN
    -- logic
END;
$$;

No RETURNS clause is allowed.
Execution uses CALL:

CALL procedure_name(value1, value2);

Core Differences Between PROCEDURE and FUNCTION

Feature	FUNCTION	PROCEDURE
Returns values	Yes (scalar, composite, set)	No
Executed with	`SELECT` or `PERFORM`	`CALL`
Transaction control	No	Yes(`COMMIT` / `ROLLBACK`)
Parallel safety	Declarative (`IMMUTABLE` / `STABLE` / `VOLATILE`)	Not applicable
Use cases	Reusable expressions, business logic	Batch jobs, maintenance tasks, ETL

Simple Stored Procedure Example

CREATE OR REPLACE PROCEDURE log_message(msg TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
    INSERT INTO logs(message, created_at)
    VALUES (msg, NOW());
END;
$$;

CALL log_message('Hello from procedure!');

Side effects (INSERT / UPDATE / DELETE) are typically the goal of a procedure.

Procedures With Transaction Control

This is the biggest reason to use procedures.
You can COMMIT / ROLLBACK inside the procedure body.

CREATE OR REPLACE PROCEDURE batch_insert()
LANGUAGE plpgsql
AS
$$
BEGIN
    INSERT INTO events(data) VALUES ('step 1');
    COMMIT;   -- commit early

    INSERT INTO events(data) VALUES ('step 2');
    ROLLBACK; -- rollback this part

    INSERT INTO events(data) VALUES ('step 3');
END;
$$;

CALL batch_insert();

NOTE: Only top-level CALL allows transaction control. Procedures cannot be called from inside a function with COMMIT / ROLLBACK.

Using IN, OUT, INOUT Parameters

Procedures may return data via OUT or INOUT parameters.
This is the only way they can output values.

CREATE PROCEDURE add_and_double(
    IN a INT,
    IN b INT,
    OUT sum_val INT,
    OUT double_sum INT
)
LANGUAGE plpgsql
AS
$$
BEGIN
    sum_val := a + b;
    double_sum := (a + b) * 2;
END;
$$;

CALL add_and_double(3, 5, NULL, NULL);

Client tools will show OUT values as a result set.

Using INOUT Parameters

CREATE PROCEDURE increment(INOUT x INT)
LANGUAGE plpgsql
AS
$$
BEGIN
    x := x + 1;
END;
$$;

CALL increment(10);   -- returns 11

Procedures With Dynamic SQL (EXECUTE)

Useful for administrative tasks, partition management, dynamic operations.

CREATE PROCEDURE drop_table_if_exists(tname TEXT)
LANGUAGE plpgsql
AS
$$
BEGIN
    EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(tname);
END;
$$;

CALL drop_table_if_exists('old_data');

Procedures and Exception Handling

Procedures support full PL/pgSQL exception blocks.

CREATE PROCEDURE safe_insert(val INT)
LANGUAGE plpgsql
AS
$$
BEGIN
    INSERT INTO numbers(num) VALUES (val);
EXCEPTION
    WHEN unique_violation THEN
        RAISE NOTICE 'Duplicate ignored: %', val;
END;
$$;

CALL safe_insert(10);

Procedures Calling Other Procedures

CREATE PROCEDURE outer_proc()
LANGUAGE plpgsql
AS
$$
BEGIN
    CALL inner_proc();
END;
$$;

Nested CALLs are allowed.
But only the top-level CALL controls transactions.

Procedures vs DO Blocks

Feature	PROCEDURE	DO Block
Persistence	Stored permanently	Temporary (inline execution)
Transaction control	Yes	No
Parameters	Yes	No
Used in extensions	Yes	No

Replacing and Dropping Procedures

DROP PROCEDURE log_message(TEXT);

CREATE OR REPLACE PROCEDURE log_message(TEXT) ...

PostgreSQL identifies procedures by name + argument types.

Listing All Procedures

SELECT
    p.proname,
    pg_get_function_arguments(p.oid) AS args,
    n.nspname AS schema
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE p.prokind = 'p';     -- 'p' = procedure

Security Considerations

SECURITY DEFINER works for procedures too.
Always set search_path explicitly to avoid SQL injection:

ALTER PROCEDURE log_message(TEXT)
SET search_path = public;

Be cautious with dynamic SQL inside SECURITY DEFINER procedures.

Real-World Example: ETL Batch Job

CREATE OR REPLACE PROCEDURE nightly_etl()
LANGUAGE plpgsql
AS
$$
BEGIN
    -- Step 1: staging load
    INSERT INTO staging SELECT * FROM raw_data;
    COMMIT;

    -- Step 2: transformations
    UPDATE staging SET cleaned = TRUE WHERE data LIKE '%ok%';
    COMMIT;

    -- Step 3: merge into production
    INSERT INTO production SELECT * FROM staging;
    COMMIT;

    -- Step 4: cleanup
    TRUNCATE staging;
END;
$$;

CALL nightly_etl();

Each ETL phase is committed independently.
Impossible to implement with functions.

Transactions, COMMIT, and ROLLBACK in PostgreSQL

Introduction

A transaction in PostgreSQL is a group of SQL statements executed as a single unit of work.

A transaction guarantees the 4 ACID properties:
- Atomicity – all-or-nothing
- Consistency – no broken constraints
- Isolation – one transaction doesn't see another's partial changes
- Durability – committed data is guaranteed to persist

You control a transaction with two commands:
- COMMIT – permanently save changes
- ROLLBACK – undo changes since the last BEGIN

Transactions isolate incomplete work from other users until finalized.

Starting a Transaction

BEGIN;
-- or: START TRANSACTION;

All subsequent SQL statements become part of this transaction until explicitly ended.

COMMIT: Saving All Changes

COMMIT finalizes the transaction and permanently writes changes to the database.
Once committed, changes cannot be undone (unless you start a new compensating transaction).

BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;   -- both updates permanently saved

ROLLBACK: Undoing All Changes

ROLLBACK cancels the entire transaction and reverts all statements back to the state before BEGIN.
No data is modified if rollback occurs.

BEGIN;

UPDATE products SET stock = stock - 10 WHERE id = 100;
UPDATE products SET stock = stock + 10 WHERE id = 200;

ROLLBACK;   -- both updates undone

Even if multiple statements executed successfully, ROLLBACK erases all of them from the current transaction.

Autocommit Behavior

By default, PostgreSQL runs in autocommit mode:
- each individual statement is its own transaction
- and is automatically committed after execution

Example:

UPDATE users SET name = 'A';   -- auto-committed immediately

To use multi-statement transactions, you must explicitly begin them using BEGIN.

Combining Multiple Operations

Transactions allow complex multi-step logic to behave atomically.

BEGIN;

INSERT INTO orders(user_id, total) VALUES (7, 250);
INSERT INTO order_items(order_id, product_id, qty)
VALUES (currval('orders_id_seq'), 42, 3);

-- If anything goes wrong:
-- ROLLBACK;

COMMIT;

Either both INSERTs happen or neither.

Savepoints: Partial Rollback

Savepoints allow you to roll back part of a transaction without canceling the whole thing.

BEGIN;

UPDATE accounts SET balance = balance - 500 WHERE id = 1;

SAVEPOINT s1;

UPDATE accounts SET balance = balance + 500 WHERE id = 99; -- account 99 doesn't exist

ROLLBACK TO s1;  -- undo only second update

COMMIT;

ROLLBACK TO does not end the transaction, only rewinds to that savepoint.

Transaction Behavior in Functions

Functions (CREATE FUNCTION) cannot use COMMIT or ROLLBACK.
They run inside the caller's transaction boundary.

-- INVALID inside functions:
COMMIT;
ROLLBACK;

Functions must behave atomically.

Transaction Behavior in Procedures

Procedures (CREATE PROCEDURE) can use transaction control.
They are invoked with CALL at the top level.
This is the main difference from functions.

CREATE PROCEDURE demo()
LANGUAGE plpgsql
AS
$$
BEGIN
    INSERT INTO logs(msg) VALUES ('step 1');
    COMMIT;   -- allowed

    INSERT INTO logs(msg) VALUES ('step 2');
    ROLLBACK; -- undo step 2 only

    INSERT INTO logs(msg) VALUES ('step 3'); -- part of new transaction
END;
$$;

CALL demo();

Transaction Isolation Levels

Transaction isolation levels define what data a transaction can see while other transactions are running concurrently.

PostgreSQL implements isolation using MVCC (Multi-Version Concurrency Control), which gives each transaction a consistent snapshot of the database.

The isolation level determines:
- whether the transaction sees other transactions' updates,
- whether rows can appear or disappear during the transaction,
- whether reading the same row twice always yields the same value,
- whether anomalies (dirty reads, write skew, phantom rows) are allowed.

You set the isolation level per transaction:

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- work inside transaction

COMMIT;

PostgreSQL supports three standard isolation levels:

Isolation Level Description (Detailed)

Isolation Level	Description (Detailed)
`READ COMMITTED`	This is the default isolation level in PostgreSQL. Each individual SQL statement sees a snapshot of data committed before that statement began. If another transaction commits in-between your statements, your next statement will see the new data. Prevents dirty reads (you never see uncommitted data). Allows non-repeatable reads: The same row queried twice may return different values if another transaction modifies and commits it between your statements. Allows phantom reads: Repeated `SELECT` queries may return more or fewer rows. Snapshot: one snapshot per statement.
`REPEATABLE READ`	The transaction sees a single consistent snapshot taken when the transaction begins. No matter what other transactions commit during your transaction, your reads stay the same. Prevents: dirty reads: never see uncommitted data non-repeatable reads: reading the same row twice always gives the same result Allows phantom reads: A repeated query on ranges may still return additional rows depending on the situation. May allow write skew anomalies (a type of race condition) unless `SERIALIZABLE` is used. Snapshot: one snapshot for the entire transaction.
`SERIALIZABLE`	The strongest isolation level — ensures the database behaves as if transactions were executed one by one, sequentially, even though they run concurrently. Automatically prevents: dirty reads non-repeatable reads phantom reads write skew anomalies Requires PostgreSQL to automatically detect dangerous conflicts. If two concurrent transactions could lead to a serialization anomaly, PostgreSQL aborts one with: `ERROR: could not serialize access due to read/write dependencies` Your application must retry the entire transaction. Snapshot: similar to `REPEATABLE READ`, but with extra checks using predicate locks to avoid anomalies.

READ COMMITTED

This is the default isolation level in PostgreSQL.
Each individual SQL statement sees a snapshot of data committed before that statement began.
If another transaction commits in-between your statements, your next statement will see the new data.
Prevents dirty reads (you never see uncommitted data).
Allows non-repeatable reads:
The same row queried twice may return different values if another transaction modifies and commits it between your statements.
Allows phantom reads:
Repeated SELECT queries may return more or fewer rows.
Snapshot: one snapshot per statement.

REPEATABLE READ

The transaction sees a single consistent snapshot taken when the transaction begins.
No matter what other transactions commit during your transaction, your reads stay the same.
Prevents:
- dirty reads: never see uncommitted data
- non-repeatable reads: reading the same row twice always gives the same result
Allows phantom reads:
A repeated query on ranges may still return additional rows depending on the situation.
May allow write skew anomalies (a type of race condition) unless SERIALIZABLE is used.
Snapshot: one snapshot for the entire transaction.

SERIALIZABLE

The strongest isolation level — ensures the database behaves as if transactions were executed one by one, sequentially, even though they run concurrently.
Automatically prevents:
- dirty reads
- non-repeatable reads
- phantom reads
- write skew anomalies
Requires PostgreSQL to automatically detect dangerous conflicts.
If two concurrent transactions could lead to a serialization anomaly, PostgreSQL aborts one with:

ERROR: could not serialize access due to read/write dependencies

Your application must retry the entire transaction.
Snapshot: similar to REPEATABLE READ, but with extra checks using predicate locks to avoid anomalies.

What Happens During Errors?

Any unhandled SQL error causes the current transaction to become aborted.
You cannot continue until you ROLLBACK.

BEGIN;

INSERT INTO a VALUES (1);
INSERT INTO a VALUES (1);  -- duplicate key error, transaction aborted

-- Now ANY statement will error until ROLLBACK is issued

ROLLBACK;   -- reset state

Real-World Example: Bank Transfer

BEGIN;

UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;

COMMIT;   -- transfer completed

Guarantees that money is never lost or created.

Real-World Example: Repairing Data

BEGIN;

DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';

-- Oops, wrong condition?
ROLLBACK;

-- try again
BEGIN;
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days';
COMMIT;

Sequences in PostgreSQL

A sequence in PostgreSQL is a special object that generates a sequence of numeric values, usually used for auto-incrementing primary keys.

Sequences are independent database objects and not tied to a table unless you explicitly connect them.

They are non-transactional counters:
- Each call increases the counter immediately
- Sequence increments cannot be rolled back

PostgreSQL automatically creates sequences for SERIAL, BIGSERIAL, and GENERATED AS IDENTITY columns.

Creating a Sequence

CREATE SEQUENCE user_id_seq;

A minimal sequence starts at 1 and increments by 1.
You can customize start, increment, min, max, cache, and cycle behavior.

CREATE SEQUENCE order_seq
    START WITH 1000
    INCREMENT BY 5
    MINVALUE 1
    MAXVALUE 1000000
    CACHE 50
    CYCLE;

CYCLE: sequence wraps to MINVALUE when MAXVALUE reached.
CACHE: performance optimization (pre-allocates numbers).

Getting Values From a Sequence

The most important sequence functions:

nextval(seqname) – returns next number and increments sequence
currval(seqname) – returns last value requested by this session
setval(seqname, value) – manually sets sequence value

SELECT nextval('user_id_seq');  -- 1
SELECT nextval('user_id_seq');  -- 2
SELECT currval('user_id_seq');  -- 2 (same session only)

currval works only after the first nextval has been called in that session.

SELECT currval('user_id_seq');
-- ERROR: currval of sequence "user_id_seq" is not yet defined in this session

Manually Setting the Value

SELECT setval('user_id_seq', 500);
SELECT nextval('user_id_seq');  -- 501

You can also tell PostgreSQL whether the next nextval should return value or value + 1:

SELECT setval('user_id_seq', 500, false); -- next nextval = 500
SELECT nextval('user_id_seq');            -- 500

Linking a Sequence to a Table Column

When you use SERIAL or BIGSERIAL, PostgreSQL automatically creates and links a sequence.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name TEXT
);

This expands to:

CREATE SEQUENCE users_id_seq
OWNED BY users.id;

ALTER TABLE users ALTER COLUMN id SET DEFAULT nextval('users_id_seq');

OWNED BY means:

dropping the column drops the sequence
the sequence is logically tied to the column

IDENTITY Columns (Modern Alternative to SERIAL)

PostgreSQL 10+ introduced GENERATED AS IDENTITY, which behaves like SQL-standard auto-increment.
Prefer it over SERIAL in new systems.

CREATE TABLE orders (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    total NUMERIC
);

Two identity modes:

GENERATED ALWAYS – always auto-generate
GENERATED BY DEFAULT – allows manual overrides

Using a Sequence in INSERT Statements

INSERT INTO users(id, name)
VALUES (nextval('user_id_seq'), 'Alice');

If the column has a default nextval, you can omit it:

INSERT INTO users(name)
VALUES ('Bob');

Viewing Sequence Properties

SELECT * FROM user_id_seq;
-- or
SELECT * FROM pg_sequences WHERE schemaname = 'public';

You can also inspect definition:

SELECT pg_get_serial_sequence('users', 'id');

Or retrieve details from system table:

SELECT * FROM pg_class WHERE relkind = 'S';

Concurrency & Non-Transactional Behavior

Sequences are non-transactional: A nextval increment is permanent, even if the transaction ROLLBACKs.

BEGIN;
SELECT nextval('user_id_seq');  -- 10
ROLLBACK;

SELECT nextval('user_id_seq');  -- 11 (NOT 10)

Sequences and Race Conditions

Two transactions calling nextval concurrently:
- both get unique values
- no locks are needed
Sequences never cause deadlocks or blocking.

-- Transaction A and B both run:
SELECT nextval('user_id_seq');

-- A gets 12
-- B gets 13

Resetting or Restarting a Sequence

ALTER SEQUENCE user_id_seq RESTART WITH 1;

You can also use setval:

SELECT setval('user_id_seq', 1, false);

NOTE: resetting sequences can break primary key uniqueness if old IDs still exist.

Dropping a Sequence

DROP SEQUENCE user_id_seq;

If it was OWNED BY a column, dropping the table or column also drops the sequence.

Real-World Examples

-- Create a custom global invoice numbering generator
CREATE SEQUENCE invoice_seq START WITH 20240001;

INSERT INTO invoices(id, total)
VALUES (nextval('invoice_seq'), 99.99);

-- Use sequence to shard IDs by region
CREATE SEQUENCE region1_seq START 1 INCREMENT 10;
CREATE SEQUENCE region2_seq START 2 INCREMENT 10;

-- Regenerating sequence after manual data load
SELECT setval('user_id_seq', (SELECT MAX(id) FROM users));

Indexes in PostgreSQL

Introduction

An index in PostgreSQL is a separate data structure used to accelerate row lookups, filtering, sorting, joining, and uniqueness enforcement.

Indexes speed up WHERE clauses, JOIN conditions, ORDER BY, GROUP BY, and distinct queries.

Indexes are not free: maintaining them adds CPU, I/O, and storage overhead on INSERT / UPDATE / DELETE.

PostgreSQL maintains indexes automatically as data changes.

Basic Index Creation

CREATE INDEX idx_users_email
ON users(email);

Creates a default B-tree index.
PostgreSQL chooses B-tree unless you explicitly specify another type.

The Main Index Types

Index Type	Best For	Notes
`B-tree`	Equality (`=`) and range (`<`, `>`) queries	Default index, supports sorting and `UNIQUE` constraints
`Hash`	Equality lookups only	Rarely used, B-tree is usually faster
`GIN`	Arrays, JSONB, full-text search	Multivalued indexing, slower to update but very fast lookup
`GiST`	Geometric types, full-text search, ranges	Flexible balanced tree framework
`SP-GiST`	Partitioned search spaces: IP ranges, quadtrees	Specialized high-performance search structures
`BRIN`	Very large tables with natural ordering (timestamps, IDs)	Tiny index size, block ranges, extremely fast to build
`Bloom` extension	Many-column indexes	Probabilistic index; extension: `pg_bloom`

B-Tree Indexes (Default)

Suitable for equality and range operations:

WHERE email = 'a@example.com'
WHERE age > 18
WHERE price BETWEEN 100 AND 200
ORDER BY created_at DESC

They support unique constraints:

CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Expression Indexes (Function Indexes)

Indexes over expressions let PostgreSQL index computed values.

CREATE INDEX idx_users_lower_email
ON users(LOWER(email));

Query must match the expression:

SELECT * FROM users WHERE LOWER(email) = 'a@example.com';

Partial Indexes (Filtered Indexes)

Indexes only part of a table for performance and space savings.

CREATE INDEX idx_active_users
ON users(email)
WHERE active = true;

Useful when most rows do not match the condition.
Example: indexing only WHERE deleted_at IS NULL.

Multi-Column Indexes

Indexes can contain multiple columns.
Column order matters a lot.

CREATE INDEX idx_users_last_first
ON users(last_name, first_name);

Index can be used for:

WHERE last_name = ... AND first_name = ...
WHERE last_name = ...

But NOT for:

WHERE first_name = ...

Index-Only Scans

If all required columns exist in the index, PostgreSQL can avoid accessing the table entirely.

This is faster because:
- index pages are smaller than table pages
- no heap tuple visibility checks needed if page is "all-visible"

CREATE INDEX idx_users_email_only
ON users(email);

SELECT email FROM users WHERE email LIKE '%@gmail.com';

Requires visibility map bits to be set — VACUUM affects index-only scan performance.

GIN Indexes

Designed for multivalued data:

Array columns
JSONB documents
Full-text search

Supports containment operators:

CREATE INDEX idx_tags_gin
ON posts USING gin(tags);

Examples:

SELECT * FROM posts WHERE tags && '{sql}';
SELECT * FROM posts WHERE tags @> '{programming}';
SELECT * FROM data WHERE jsonb_data @> '{"user": {"id": 5}}';

GiST Indexes

Generalized Search Tree — supports many non-linear data types:

geometric types
ranges
full-text search
ltree hierarchical search

CREATE INDEX idx_ranges_gist
ON events USING gist(period);

SP-GiST Indexes

Used for partitioned search spaces such as:

IP addresses (radix trees)
geometric quadtrees
trie-like structures

CREATE INDEX idx_ip_spgist
ON ips USING spgist(ip_address);

BRIN Indexes

Block Range Index — extremely tiny and fast to build.
Best for very large tables where data is naturally ordered:
- timestamps
- auto-incrementing IDs
Index stores summary values for each 8k block range.

CREATE INDEX idx_logs_brin
ON logs USING brin(timestamp);

Index Bloat

PostgreSQL never updates index rows in-place.
Updates create new index entries.
Dead entries accumulate until vacuumed.
B-tree indexes are most sensitive to bloat.

REINDEX INDEX idx_users_email;
REINDEX TABLE users;

Fillfactor

Defines how full index pages may become.
Useful for high-update workloads.

CREATE INDEX idx_items_price
ON items(price)
WITH (fillfactor = 70);

NULL Handling in Indexes

B-tree indexes store NULLs and sort them first or last depending on ASC/DESC.
Use partial indexes to exclude NULLs:

CREATE INDEX idx_not_null_price
ON items(price)
WHERE price IS NOT NULL;

Collation-Aware Indexing

Index sorting depends on collation.

CREATE INDEX idx_names_collation
ON people(name COLLATE "de_DE");

Operator Classes

Index behavior changes depending on operator classes.
Example: text pattern ops optimize LIKE 'abc%'.

CREATE INDEX idx_users_email_pattern
ON users(email text_pattern_ops);

Covering Indexes (INCLUDE)

Store extra columns in the index for index-only scans, without affecting index sorting.

CREATE INDEX idx_orders_user_date
ON orders(user_id)
INCLUDE (order_date);

Dropping Indexes

DROP INDEX idx_users_email;

Real-World Index Examples

-- Search by email, case-insensitive
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));

-- Fast soft-deletes
CREATE INDEX idx_posts_not_deleted
ON posts(id)
WHERE deleted_at IS NULL;

-- Fast JSONB lookup
CREATE INDEX idx_events_jsonb
ON events USING gin(data);

-- Speed up sorting queries
CREATE INDEX idx_logs_ts_desc
ON logs(timestamp DESC);

-- Range queries on large time-based table
CREATE INDEX idx_sensor_brin
ON sensor_data USING brin(recorded_at);

Composite Types

Introduction

A composite type in PostgreSQL is a user-defined structured type, similar to a "record" or "struct".

Composite types allow grouping multiple fields into a single logical type.

They can be used:
- as column types,
- as function return types,
- inside arrays, JSON, or other nested structures,
- with row operations (ROW(...)),
- with PL/pgSQL records and variables.

Composite types behave like anonymous rows (ROW(...)) but are reusable and schema-bound.

Creating a Composite Type

CREATE TYPE address AS (
    street      TEXT,
    city        TEXT,
    postal_code TEXT,
    country     TEXT
);

This defines a reusable structured type named address.
You can use it anywhere a normal type is allowed.

Using Composite Types as Table Columns

CREATE TABLE users (
    id      SERIAL PRIMARY KEY,
    name    TEXT,
    addr    address
);

Insert using ROW(...) or a literal record:

INSERT INTO users(name, addr)
VALUES (
    'Alice',
    ROW('Main Street 1', 'Berlin', '10115', 'Germany')
);

You can also explicitly cast a row literal:

INSERT INTO users(name, addr)
VALUES (
    'Bob',
    ('Highway 9', 'Munich', '80331', 'Germany')::address
);

Accessing Fields of a Composite Column

Use dot notation:

SELECT
    name,
    addr.city,
    addr.country
FROM users
WHERE addr.postal_code = '10115';

Composite fields fully support indexing:

CREATE INDEX idx_users_addr_city
ON users((addr).city);

Row Constructor: ROW(...)

ROW(a, b, c) constructs an anonymous composite value.
Useful for comparisons and inserts.

SELECT ROW(1, 'abc', now());

Compare rows lexicographically:

SELECT ROW(1, 2) < ROW(1, 3);                -- true
SELECT ROW('Alice', 20) = ROW('Alice', 20);  -- true

Composite Types in Functions

Functions may return a composite type directly.

CREATE TYPE user_summary AS (
    id     INT,
    name   TEXT,
    city   TEXT
);

CREATE FUNCTION get_user_summary(uid INT)
RETURNS user_summary
LANGUAGE sql
AS
$$
    SELECT id, name, (addr).city
    FROM users
    WHERE id = uid;
$$;

Usage:

SELECT * FROM get_user_summary(1);

Returning Sets of Composite Types

CREATE FUNCTION get_all_summaries()
RETURNS SETOF user_summary
LANGUAGE sql
AS
$$
    SELECT id, name, (addr).city
    FROM users;
$$;

Set-returning composite function behaves like a table.

SELECT * FROM get_all_summaries();

Composite Types in PL/pgSQL Variables

DECLARE
    u users%ROWTYPE;
    a address;
BEGIN
    SELECT addr INTO a FROM users WHERE id = 1;

    u.id := 1;
    u.name := 'Test';
    u.addr := a;
END;

%ROWTYPE stores all columns of a table.
You can assign entire composite values directly.

Modifying Specific Fields of Composite Columns

UPDATE users
SET addr = addr::address  -- explicit cast recommended
        || ROW(NULL, 'Hamburg', NULL, NULL)
WHERE id = 1;

The || operator merges ROW values, non-null values overwrite existing ones.

UPDATE users
SET addr = (addr).city := 'Hamburg';
-- Invalid syntax, but commonly attempted

Correct modification:

UPDATE users
SET addr = (addr).street,
          (addr).city := 'Hamburg',
          (addr).postal_code,
          (addr).country;
-- must rebuild full composite value

You must always rebuild the entire composite value.

Composite Type Literals

Typed composite literal:

SELECT '(Main, Berlin, 10115, Germany)'::address;

Nested composite literals require careful escaping:

'("123 Ave",("Berlin","DE"))'

Arrays of Composite Types

CREATE TABLE shops (
    id     SERIAL PRIMARY KEY,
    name   TEXT,
    locations address[]
);

Insert composite arrays:

INSERT INTO shops(name, locations)
VALUES (
    'Superstore',
    ARRAY[
        ('Main St', 'Berlin', '10115', 'Germany')::address,
        ('Highway Rd', 'Munich', '80331', 'Germany')::address
    ]
);

Composite Types in JSON Conversion

SELECT to_json(addr) FROM users;

Produces structural JSON:

{
  "street": "Main Street 1",
  "city": "Berlin",
  "postal_code": "10115",
  "country": "Germany"
}

Altering Composite Types

ALTER TYPE address ADD ATTRIBUTE province TEXT;
ALTER TYPE address DROP ATTRIBUTE postal_code;
ALTER TYPE address RENAME ATTRIBUTE street TO street_name;

Composite types can be modified safely, except when used in stored functions.

Dropping Composite Types

DROP TYPE address;

Fails if dependent tables, functions, or types still reference it.
Use CASCADE with extreme caution.

Real-World Examples

-- Geographic point stored as composite
CREATE TYPE geo_point AS (
    lat  DOUBLE PRECISION,
    lon  DOUBLE PRECISION
);

CREATE TABLE events (
    id      SERIAL PRIMARY KEY,
    title   TEXT,
    pos     geo_point
);

INSERT INTO events(title, pos)
VALUES ('Concert', ROW(52.5200, 13.4050));

-- Composite type as function OUT parameters
CREATE TYPE product_summary AS (
    product_id INT,
    name       TEXT,
    price      NUMERIC
);

CREATE FUNCTION summarize_product(pid INT)
RETURNS product_summary
LANGUAGE sql
AS
$$
    SELECT id, name, price FROM products WHERE id = pid;
$$;

-- Composite type arrays for time-series sensors
CREATE TYPE measurement AS (
    ts   TIMESTAMP,
    temp NUMERIC,
    hum  NUMERIC
);

CREATE TABLE devices (
    id      SERIAL PRIMARY KEY,
    logs    measurement[]
);

Arrays

PostgreSQL supports true multi-dimensional array types, allowing any base type (integer, text, composite types, enums, domains, etc.) to be represented as an array.

Arrays can be:
- 1-dimensional (INT[])
- multi-dimensional (INT[][], TEXT[][][])
- mixed with composite values (mytype[])

Declaring Array Columns

CREATE TABLE products (
    id       SERIAL PRIMARY KEY,
    name     TEXT,
    tags     TEXT[],
    ratings  INT[],
    matrix   INT[][]   -- multi-dimensional
);

Array syntax adds [] to any data type.
Arrays can store NULL values per element, and the array itself may also be NULL.

Array Literals

Values are written using curly braces {}.

INSERT INTO products(name, tags, ratings)
VALUES ('Keyboard', '{mechanical,usb}', '{5,4,5}');

Strings inside arrays must be properly quoted:

'{ "a value", "another value" }'

Escape special characters inside strings:

'{ "a\\tb", "with \\"quotes\\"" }'

Explicit Casting to Arrays

SELECT ARRAY[1, 2, 3]::INT[];
SELECT ARRAY['a', 'b', 'c']::TEXT[];

You can construct multi-dimensional arrays:

SELECT ARRAY[[1,2],[3,4]]::INT[][];

Accessing Array Elements

Arrays are 1-based indexed!

SELECT
    tags[1],
    ratings[3]
FROM products
WHERE id = 1;

Slicing:

SELECT ratings[2:4] FROM products;
SELECT matrix[1:2][1] FROM products;

Assigning array elements:

UPDATE products
SET ratings[2] = 3
WHERE id = 1;

Array Operators

PostgreSQL arrays support many operators:

Operator	Description	Example
`&&`	Array overlap	`'{a,b}' && '{b,c}'`
`@>`	Array contains	`'{a,b,c}' @> '{b}'`
`<@`	Contained by	`'{b}' <@ '{a,b,c}'`
`\|\|`	Concatenate	`'{a,b}' \|\| '{c,d}'`
`=`, `<>`	Equality	`'{1,2}' = '{1,2}'`

Examples:

SELECT '{a,b}' && '{x,a}';        -- true
SELECT '{a,b,c}' @> '{a,c}';      -- true
SELECT '{a}' <@ '{a,b,c}';        -- true
SELECT '{1,2}' || '{3,4}';        -- {1,2,3,4}

Array Functions

Function	Description	Example
`array_length(arr, dim)`	Length of dimension	`array_length('{1,2,3}', 1)`
`array_append(arr, elem)`	Append element	`array_append('{1,2}', 3)`
`array_prepend(elem, arr)`	Prepend element	`array_prepend(0, '{1,2}')`
`array_cat(a1, a2)`	Concatenate	`array_cat('{1,2}', '{3,4}')`
`unnest(arr)`	Expand array to rows	`SELECT unnest('{1,2,3}')`
`string_to_array`	Parse string to array	`string_to_array('a,b', ',')`
`array_to_string`	Array to delimited text	`array_to_string('{a,b}', ',')`

Searching Array Columns Efficiently

Searching array columns with && or @> requires a GIN or GiST index:

CREATE INDEX idx_products_tags_gin
ON products USING gin(tags);

Fast queries:

SELECT *
FROM products
WHERE tags @> '{mechanical}';

SELECT *
FROM products
WHERE tags && '{usb,wireless}';

Comparing Arrays

SELECT '{1,2}' = '{1,2}';       -- true
SELECT '{1,2}' <> '{2,1}';      -- true (order matters)

Arrays are order-sensitive.

Multi-Dimensional Arrays

SELECT ARRAY[
    [1,2,3],
    [4,5,6],
    [7,8,9]
] AS matrix;

SELECT matrix[2][3] FROM table_name;   -- element access

All sub-arrays must have matching dimensions.

-- INVALID:
SELECT ARRAY[[1,2], [3]];  -- mismatched dimensions

Array Unnesting & Working With Rows

unnest() expands an array into a set of rows.

Below we use this sample table:

CREATE TABLE products (
    id     INT,
    name   TEXT,
    tags   TEXT[],
    matrix INT[][]
);

INSERT INTO products VALUES
(1, 'Keyboard', '{mechanical,usb,black}', '{{1,2},{3,4}}'),
(2, 'Mouse',    '{wireless,black}',       '{{5,6},{7,8}}');

Unnesting a 1-D Array

SELECT
    id,
    unnest(tags) AS tag
FROM products;

Result:

id	tag
1	mechanical
1	usb
1	black
2	wireless
2	black

Unnesting a Multi-Dimensional Array

Unnesting a 2D array flattens it into a 1D list.

SELECT
    id,
    unnest(matrix) AS value
FROM products;

Result:

id	value
1	1
1	2
1	3
1	4
2	5
2	6
2	7
2	8

Unnesting Multiple Arrays in Parallel

You can unnest two arrays side-by-side using WITH ORDINALITY or SELECT ... FROM unnest(a, b).
Arrays must be the same length.

SELECT
    id,
    x,
    y
FROM products,
     unnest(tags, array[1,2,3]) AS t(x, y)
WHERE id = 1;

Result:

id	x (tag)	y (mapped value)
1	mechanical	1
1	usb	2
1	black	3

Using Unnest With Ordinality

WITH ORDINALITY adds a row number for each element.
Useful for preserving order inside arrays.

SELECT
    id,
    tag,
    ord
FROM products,
     unnest(tags) WITH ORDINALITY AS t(tag, ord);

Result:

id	tag	ord
1	mechanical	1
1	usb	2
1	black	3
2	wireless	1
2	black	2

Unnesting and Joining Back to the Table

SELECT p.id, p.name, t.tag
FROM products p
JOIN LATERAL unnest(p.tags) AS t(tag)
ON true;

This preserves products columns while expanding arrays.

Unnesting Arrays of Composite Types

CREATE TYPE measurement AS (
    ts   TIMESTAMP,
    val  NUMERIC
);

CREATE TABLE devices (
    id   INT,
    logs measurement[]
);

INSERT INTO devices VALUES
(1, ARRAY[
    ('2024-01-01 10:00', 20.5)::measurement,
    ('2024-01-01 11:00', 21.1)::measurement
]);

SELECT
    id,
    (log).ts,
    (log).val
FROM devices,
     unnest(logs) AS t(log);

Result:

id	ts	val
1	2024-01-01 10:00	20.5
1	2024-01-01 11:00	21.1

Arrays of Composite Types

CREATE TYPE measurement AS (
    ts   TIMESTAMP,
    temp NUMERIC,
    hum  NUMERIC
);

CREATE TABLE device_logs (
    id      SERIAL PRIMARY KEY,
    logs    measurement[]
);

INSERT INTO device_logs(logs)
VALUES (ARRAY[
    ('2024-01-01', 20.5, 30)::measurement,
    ('2024-01-01', 21.0, 32)::measurement
]);

Updating Entire Arrays

UPDATE products
SET tags = array_append(tags, 'newtag')
WHERE id = 1;

UPDATE products
SET ratings = ARRAY[5,5,5];

Deleting Elements From Arrays

UPDATE products
SET tags = array_remove(tags, 'usb')
WHERE id = 1;

UPDATE products
SET ratings = array_replace(ratings, 1, NULL);

Checking If an Element Exists

SELECT * FROM products
WHERE 'mechanical' = ANY(tags);

SELECT * FROM products
WHERE 5 = ALL(ratings);  -- all values must equal 5

Dropping an Array Column

ALTER TABLE products DROP COLUMN tags;

Real-World Examples

-- Tagging system
CREATE INDEX idx_posts_tags_gin
ON posts USING gin(tags);

SELECT * FROM posts WHERE tags @> '{sql}';

-- Storing daily weather measurements
CREATE TABLE weather (
    id    SERIAL,
    temps NUMERIC[]
);

INSERT INTO weather(temps)
VALUES (ARRAY[15.5,16.0,17.2]);

-- Composite array for sensor logs
CREATE TYPE logentry AS (ts TIMESTAMP, val NUMERIC);

CREATE TABLE sensors (
    id     SERIAL,
    logs   logentry[]
);

ENUM Types in PostgreSQL

Introduction

An ENUM in PostgreSQL is a user-defined data type consisting of a fixed set of text labels.

ENUMs are schema-bound, versioned, and persistent types, and may be used in any table, composite type, or function.

Creating an ENUM Type

CREATE TYPE order_status AS ENUM (
    'pending',
    'processing',
    'shipped',
    'delivered',
    'cancelled'
);

The order of values matters for comparisons: 'pending' < 'processing'.

Using ENUM Types in Tables

CREATE TABLE orders (
    id      SERIAL PRIMARY KEY,
    product TEXT,
    status  order_status NOT NULL
);

INSERT INTO orders(product, status)
VALUES ('Laptop', 'pending');

Using an invalid value results in an error:

INSERT INTO orders(product, status)
VALUES ('Laptop', 'waiting');   -- ERROR: invalid input value for enum

Ordering and Comparison

ENUM comparison is based on the order defined during type creation.

SELECT 'pending' < 'processing';     -- true
SELECT 'delivered' > 'shipped';      -- true

SELECT * FROM orders
ORDER BY status;

This sorts according to the type definition, not alphabetical order.

Changing ENUM Types

Cannot remove or rename existing values directly (dangerous to data integrity).

You can add new values:

ALTER TYPE order_status ADD VALUE 'returned';

Append at a specific position:

ALTER TYPE order_status ADD VALUE 'awaiting_payment'
BEFORE 'pending';

ALTER TYPE order_status ADD VALUE 'archived'
AFTER 'cancelled';

Renaming an ENUM Value

ALTER TYPE order_status RENAME VALUE 'pending' TO 'awaiting';

Safe operation; modifies system catalog only.

Dropping an ENUM Type

DROP TYPE order_status;

Fails if objects still use the type.
Use CASCADE with caution:

DROP TYPE order_status CASCADE;

ENUM Arrays

CREATE TABLE tickets (
    id       SERIAL PRIMARY KEY,
    labels   order_status[]
);

INSERT INTO tickets(labels)
VALUES ('{pending,shipped}');

Array operations apply normally:

SELECT 'shipped' = ANY(labels) FROM tickets;

Using ENUMs in Composite Types

CREATE TYPE product_info AS (
    name   TEXT,
    status order_status
);

SELECT (ROW('Tablet','pending')::product_info).status;

ENUM Type Casting and Implicit Behavior

ENUM labels are text-like but require explicit casting in some cases.

SELECT 'pending'::order_status;

Implicit cast test:

SELECT status = 'pending' FROM orders;    -- works (implicit)

But some operations require explicit casts:

SELECT ARRAY['pending','shipped'];            -- ERROR
SELECT ARRAY['pending'::order_status,'shipped'::order_status];

ENUM vs CHECK Constraints

Approach	Pros	Cons
ENUM Type	Faster comparisons Compact storage Ordered Strong type enforcement	Hard to remove/rename values Schema migration overhead
CHECK constraint	Easier to modify No global type coupling	More storage No meaningful ordering Duplicate constraints across tables

Performance Considerations

ENUM = and comparison operations are faster than TEXT because values are internally integers.

ENUMs index extremely well with B-tree indexes:

CREATE INDEX idx_orders_status ON orders(status);

Migrating ENUM Values Safely

ALTER TYPE order_status ADD VALUE 'in_transit';

You cannot remove ENUM values.
You cannot change their order after creation.

Best practice for major schema migration:

-- 1. Rename old enum
ALTER TYPE order_status RENAME TO order_status_old;

-- 2. Create new enum with correct values
CREATE TYPE order_status AS ENUM ('pending','processing','...');

-- 3. Convert columns
ALTER TABLE orders
ALTER COLUMN status TYPE order_status
USING status::TEXT::order_status;

-- 4. Drop old enum
DROP TYPE order_status_old;

Real-World Examples

-- User roles
CREATE TYPE user_role AS ENUM ('user','moderator','admin');

CREATE TABLE users (
    id    SERIAL,
    name  TEXT,
    role  user_role NOT NULL DEFAULT 'user'
);

-- Ticket priorities with ordering
CREATE TYPE priority AS ENUM ('low','medium','high','urgent');

SELECT * FROM tickets ORDER BY priority;   -- correct severity order

-- Payment status usage
CREATE TYPE payment_status AS ENUM ('created','paid','refunded');

CREATE TABLE invoices (
    id      SERIAL,
    amount  NUMERIC,
    status  payment_status
);

The `xml` Type in PostgreSQL

PostgreSQL provides a native xml data type that stores well-formed XML documents or fragments.

It supports:
- XML parsing and validation
- XPath queries
- XML construction functions
- Schema validation (XSD)
- Conversions between XML, TEXT, and JSON

The xml type stores the document in text form but performs structure validation according to XML standards.

Declaring XML Columns

CREATE TABLE books (
    id      SERIAL PRIMARY KEY,
    title   TEXT,
    metadata XML
);

Insert XML using string literals:

INSERT INTO books(title, metadata)
VALUES (
    'Dune',
    '<book><author>Herbert</author><year>1965</year></book>'
);

Invalid XML raises an error:

INSERT INTO books(metadata)
VALUES ('<bad xml>');   -- ERROR: not well-formed XML

XML Parsing and Validation

The xml type validates structure during inserts and casts.
To enforce more rules (DTD, XSD), use validation functions.

Constructing XML with XMLPARSE

SELECT XMLPARSE(DOCUMENT '<root><a>1</a></root>');

SELECT XMLPARSE(CONTENT '<a>value</a>');

DOCUMENT: requires a single top-level element.
CONTENT: allows fragments.

Converting TEXT to XML and XML to TEXT

SELECT '<tag>hello</tag>'::xml;
SELECT xmlserialize(CONTENT metadata AS TEXT) FROM books;

xmlserialize is the SQL-standard way to convert XML back to text.

Extracting Data with XPath

PostgreSQL implements XPath 1.0 via xpath().

SELECT xpath('/book/author/text()', metadata)
FROM books;

Returns an array of XML values.
To get a scalar text:

SELECT (xpath('/book/year/text()', metadata))[1]::text
FROM books;

Querying XML with Namespaces

SELECT xpath(
    'declare default element namespace "http://example.com/book";
     /book/author/text()',
    metadata
)
FROM books;

Constructing XML with xmlelement, xmlforest, xmlconcat

SELECT xmlelement(
    NAME book,
    xmlelement(NAME title, 'Dune'),
    xmlelement(NAME year, 1965)
);

SELECT xmlforest(
    title AS title,
    'Frank Herbert' AS author
);

SELECT xmlconcat(
    '<a>1</a>'::xml,
    '<b>2</b>'::xml
);

Using XMLTABLE (SQL Standard)

XMLTABLE treats XML like a relational table.

SELECT *
FROM XMLTABLE(
    '/book'
    PASSING '<book><title>Dune</title><year>1965</year></book>'::xml
    COLUMNS
        title TEXT PATH 'title',
        year  INT  PATH 'year'
);

Output:

title	year
Dune	1965

XML Schema Validation (XSD)

SELECT xml_is_well_formed_document(metadata);
SELECT xmlvalidate(metadata, '...xsd contents...'::xml);

Schema validation requires extensions or external tools depending on PostgreSQL version and build.

Indexing XML Data

PostgreSQL does not natively index XML structure.
You can index extracted values using expression indexes:

CREATE INDEX idx_books_author
ON books ((xpath('/book/author/text()', metadata))[1]);

Or convert XML to JSONB and index via GIN:

CREATE INDEX idx_books_jsonb
ON books USING gin (xml_to_jsonb(metadata));

XML Escaping Rules

Inside XML, escape special characters:

Character	Escape
`<`	`<`
`>`	`>`
`&`	`&`
`"`	`"`
`'`	`'`

SELECT '<tag value="hello"/>'::xml;

XML vs JSON

Feature	XML	JSONB
Data Model	Tree with attributes, namespaces, mixed content	Key-value, arrays, no attributes
Schema	DTD/XSD support	JSON Schema (external)
Indexing	Expression-only	Native GIN indexing
Use Cases	Interoperability, legacy systems, SOAP	APIs, logs, analytics

Real-World Examples

-- Book metadata
INSERT INTO books(title, metadata) VALUES (
    'Hyperion',
    '<book><author>Simmons</author><year>1989</year></book>'
);

-- Extracting author names
SELECT title,
       (xpath('/book/author/text()', metadata))[1]::text AS author
FROM books;

-- Converting a relational row to XML
SELECT xmlelement(
    NAME book,
    xmlforest(title, metadata)
)
FROM books;

-- Using XMLTABLE for report generation
SELECT b.title, x.year
FROM books b,
     XMLTABLE(
         '/book'
         PASSING b.metadata
         COLUMNS year INT PATH 'year'
     ) AS x;

JSON and JSONB Types in PostgreSQL

PostgreSQL provides two native JSON types: json and jsonb.

Both can store arbitrary structured JSON documents:
- objects ({"key": "value"})
- arrays ([1,2,3])
- nested structure
- null / booleans / numbers / strings

The key difference is:
- json: stored as text, no preprocessing
- jsonb: stored in a binary format, parsed, canonicalized

Modern PostgreSQL applications use jsonb almost exclusively.

json vs jsonb

Feature	json	jsonb
Storage	Exact text	Binary parsed structure
Duplicate Keys	Preserved	Last key wins
Whitespace	Preserved	Discarded
Indexing	No	GIN/GiST indexes supported
Modification	Slower	Faster operations and path queries
Sort Order	N/A	Stable order

Unless exact text preservation matters, choose jsonb.

Creating JSON Columns

CREATE TABLE users (
    id      SERIAL PRIMARY KEY,
    profile JSONB
);

INSERT INTO users(profile)
VALUES ('{"name": "Alice", "age": 30, "skills": ["sql","python"]}');

Accessing JSON Fields

Operator	Description
`->`	Get JSON value (object or array) by key or index (returns JSON)
`->>`	Get value as text
`#>`	Get JSON from a path array
`#>>`	Get text from a path array

SELECT
    profile->'name' AS name_json,
    profile->>'name' AS name_text,
    profile->'skills'->0 AS first_skill
FROM users;

-- Using path arrays
SELECT profile#>'{address, city}' FROM users;
SELECT profile#>>'{address, zip}' FROM users;

Updating JSONB Fields

-- Replace a key
UPDATE users
SET profile = jsonb_set(profile, '{age}', '31'::jsonb)
WHERE id = 1;

-- Add new key
UPDATE users
SET profile = jsonb_set(profile, '{email}', '"a@example.com"', true);

-- Update nested arrays
UPDATE users
SET profile = jsonb_set(profile, '{skills,1}', '"postgresql"')
WHERE id = 1;

Deleting Fields

UPDATE users
SET profile = profile - 'age';

UPDATE users
SET profile = profile #- '{skills,0}';

JSONB Containment Operators

Operator	Meaning	Example
`@>`	Contains	`'{"a":1,"b":2}' @> '{"a":1}'`
`<@`	Contained by	`'[1,2]' <@ '[1,2,3]'`
`?`	Key exists	`'{"a":1}' ? 'a'`
`?\|`	Any key exists	`'{"a":1,"b":2}' ?\| '{a,c}'`
`?&`	All keys exist	`'{"a":1,"b":2}' ?& '{a,b}'`

SELECT * FROM users
WHERE profile @> '{"skills": ["sql"]}';

Important: JSONB array containment checks require exact match unless additional operators are used.

Querying Inside JSON Arrays

SELECT * FROM users
WHERE profile->'skills' ? 'python';

SELECT * FROM users
WHERE 'sql' = ANY(jsonb_array_elements_text(profile->'skills'));

Indexing JSONB

GIN indexes are the standard choice for JSONB.

CREATE INDEX idx_users_profile_gin
ON users USING gin(profile);

Efficient for:

@> containment
path operations
key-existence operators

SELECT * FROM users
WHERE profile @> '{"age": 30}';

Expression Indexes on JSON Fields

CREATE INDEX idx_users_city
ON users ((profile#>>'{address,city}'));

SELECT *
FROM users
WHERE profile#>>'{address,city}' = 'Berlin';

Converting Between JSON, JSONB, and TEXT

SELECT '{"a":1}'::jsonb;
SELECT profile::json;
SELECT profile::text;

SELECT to_json(ARRAY[1,2,3]);
SELECT jsonb_pretty(profile) FROM users;

Aggregating to JSON

SELECT json_agg(profile) FROM users;

SELECT jsonb_build_object('count', COUNT(*), 'users', jsonb_agg(id))
FROM users;

Building JSON Documents

SELECT jsonb_build_object(
    'name', name,
    'skills', skills,
    'age', age
)
FROM employees;

JSONB Path Queries (@ SQL/JSON)

PostgreSQL supports SQL/JSON path expressions:

SELECT jsonb_path_query(profile, '$.skills[*] ? (@ == "sql")')
FROM users;

SELECT * FROM users
WHERE jsonb_path_exists(profile, '$.age ? (@ > 20)');

Performance Notes

jsonb is optimized for:
- searching
- indexing
- containment
- updates

json is faster for:
- pure text I/O
- storing raw API responses

GIN indexes may become large; consider jsonb_path_ops for containment-only indexes.

CREATE INDEX idx_users_profile_pathops
ON users USING gin(profile jsonb_path_ops);

Real-World Examples

-- Find users in a specific city
SELECT * FROM users
WHERE profile#>>'{address,city}' = 'Berlin';

-- Check if user has a skill
SELECT * FROM users
WHERE profile->'skills' ? 'postgresql';

-- Merge JSON objects
SELECT profile || '{"active":true}'::jsonb
FROM users;

-- Build a complete API response
SELECT jsonb_build_object(
    'status', 'ok',
    'timestamp', now(),
    'data', jsonb_agg(profile)
)
FROM users;

GIN Indexes in PostgreSQL

GIN stands for Generalized Inverted Index.

GIN indexes are designed for indexing values that contain multiple components, such as:
- arrays
- JSONB objects
- documents (full-text search)
- key-value stores (hstore)
- trigrams (pg_trgm)
- range types

GIN is optimized for containment queries:
- @> (contains)
- && (overlaps)
- ?, ?|, ?& (JSONB and hstore key checks)
- full-text operators (@@, @@@)

GIN indexes are usually much larger than B-tree, but provide excellent query performance on complex data.

How GIN Works Internally

GIN indexes store a mapping from value component → row locations.

For example, if a row has tags = '{a,b,c}', GIN indexes it as:
- a → row_id
- b → row_id
- c → row_id

This makes GIN extremely efficient for membership / containment operations.

GIN stores:
- a posting tree for values appearing many times
- a posting list for values appearing few times

PostgreSQL chooses dynamically whether to store a list or a tree.

Creating a GIN Index

CREATE INDEX idx_users_tags_gin
ON users USING gin(tags);

Default operator class depends on the column type.

GIN Indexes on Arrays

CREATE TABLE products (
    id   SERIAL,
    tags TEXT[]
);

CREATE INDEX idx_products_tags_gin
ON products USING gin(tags);

Now these queries are extremely fast:

SELECT * FROM products WHERE tags @> '{sql}';
SELECT * FROM products WHERE tags && '{postgres,sql}';

How it works:
- GIN indexes each element individually.

GIN Indexes on JSONB

CREATE TABLE events (
    id   SERIAL,
    data JSONB
);

CREATE INDEX idx_events_data_gin
ON events USING gin(data);

SELECT * FROM events WHERE data @> '{"type":"login"}';
SELECT * FROM events WHERE data ? 'user_id';
SELECT * FROM events WHERE data @> '{"tags":["security"]}';

Supported operators include:
- @> contains
- ? key
- ?| any_key
- ?& all_keys

jsonb_path_ops vs default jsonb_ops

jsonb_ops (default):

Indexes full JSON structure
Supports all JSONB operators
Larger index size

jsonb_path_ops:

Indexes only paths → great for @>
Smaller index
Does NOT support ?, ?|, ?&

CREATE INDEX idx_events_data_pathops
ON events USING gin(data jsonb_path_ops);

GIN for Full-text Search (TSVECTOR)

CREATE TABLE docs (
    id    SERIAL,
    body  TEXT
);

CREATE INDEX idx_docs_tsv
ON docs USING gin(to_tsvector('english', body));

SELECT * FROM docs
WHERE to_tsvector('english', body) @@ to_tsquery('security & data');

GIN indexes are usually faster than GiST for full-text search, but larger.

GIN for hstore

CREATE EXTENSION hstore;

CREATE TABLE settings (
    id    SERIAL,
    attrs hstore
);

CREATE INDEX idx_settings_attrs
ON settings USING gin(attrs);

SELECT * FROM settings
WHERE attrs ? 'theme';

GIN with pg_trgm (trigram search)

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_users_name_trgm
ON users USING gin (name gin_trgm_ops);

Efficient for:

LIKE '%term%'
ILIKE '%term%'
Regex matching
Similarity search (%<)

GIN Index Options

CREATE INDEX idx_logs_data_gin
ON logs USING gin(data)
WITH (fastupdate = on);

fastupdate:
- ON (default): inserts go to a pending list, flushed lazily
- OFF: slower inserts, faster reads
gin_pending_list_limit:
- Controls the size of the pending list
- Flushed automatically when threshold reached

Maintenance & Reindexing

VACUUM (ANALYZE) table_name;
REINDEX INDEX idx_name;

GIN indexes tend to grow over time because new keys get added but old keys are not deleted immediately.
Auto-vacuum manages GIN cleanup, but manual tuning may be needed for large workloads.

Performance Notes

GIN index size is large (sometimes 2–5× table size).

Inserts and updates are slower than B-tree because:
- each indexed element is stored separately

Reads are extremely fast for supported operators.

Best for:
- JSONB documents
- arrays
- hstore
- text search
- similarity search

Real-World Examples

-- Search logs by JSONB keys
SELECT * FROM logs
WHERE data @> '{"status":"error"}';

-- Find users with certain tags
SELECT * FROM users
WHERE tags && '{music,sql}';

-- Fast full-text search with trigram indexing
SELECT * FROM documents
WHERE body ILIKE '%database%';

-- Multi-condition JSONB search
SELECT * FROM events
WHERE data @> '{"user":{"id":5}}';

The `search_path` in PostgreSQL

Introduction

The search_path is a PostgreSQL configuration that determines how unqualified names (tables, functions, types) are resolved.

Whenever you reference an object without a schema name, PostgreSQL searches schemas in the order defined by search_path.

The default value is usually:

SHOW search_path;
-- "$user", public

This means:
- First: a schema named after the current user (rarely exists)
- Then: the public schema

Why Search Path Matters

Prevents needing fully qualified names like myschema.mytable everywhere.

Affects how:
- SELECT, INSERT, UPDATE, DELETE resolve tables
- Functions are resolved
- Operators are resolved
- Types are resolved

Important for:
- Multi-schema applications
- Security (SQL injection risks)
- Schema-based versioning
- Extensions

Setting the Search Path

SET search_path TO myschema, public;

SET LOCAL search_path TO analytics;
-- Only for duration of current transaction

ALTER DATABASE mydb SET search_path = myschema, public;
ALTER USER myuser SET search_path = app, public;

Precedence:

SET (session-level) overrides user and DB settings
ALTER USER overrides ALTER DATABASE

How Object Resolution Works

Given:

SET search_path TO sales, public;

SELECT customers.id FROM customers;

PostgreSQL looks for customers in:

1. sales.customers
2. public.customers

If both exist, the earlier schema wins.

SELECT sales.customers.id  -- always resolved exactly
FROM sales.customers;

Fully qualified names always bypass search_path.

Search Path and Function Resolution

Function lookup also uses search_path.
This can lead to ambiguity:

SET search_path TO analytics, public;

SELECT format_date('2024-01-01');

PostgreSQL resolves format_date first in analytics, then public.

To force a specific function:

SELECT utils.format_date('2024-01-01');

Search Path and Security

This is one of the most important reasons to understand search_path.

If search_path includes a schema writable by untrusted users, they can override functions!

Danger example: SQL injection possibility.

SET search_path TO public, malicious_schema;

-- A malicious user creates a function matching your app's name
CREATE FUNCTION malicious_schema.now() RETURNS timestamp AS
$$
    SELECT '1970-01-01'::timestamp;
$$ LANGUAGE sql;

Now your application calling SELECT now() executes attacker code!

Best practice:

Always put pg_catalog first.
Never put writable schemas before trusted ones.
For functions, explicitly schema-qualify when safety matters.

ALTER ROLE myapp SET search_path = pg_catalog, app, public;

Temporary Tables and Search Path

Temp tables live in a special schema: pg_temp.

PostgreSQL automatically puts pg_temp at the beginning of search_path if any temp tables exist.

CREATE TEMP TABLE t(x int);
SHOW search_path;
-- pg_temp, pg_catalog, public

Unqualified names now resolve to the temp table first!

search_path for Extensions

CREATE EXTENSION postgis SCHEMA gis;

Functions from extensions are usually called without qualification.

Make sure the schema containing extension functions is on search_path.

Best Practices

Always include pg_catalog first.
For applications, use: pg_catalog, app_schema
Do not include schemas writable by untrusted users.
Always fully qualify objects inside migrations:

CREATE TABLE app.users (...);
CREATE FUNCTION app.calculate() ...;

This prevents migrations from breaking if search_path changes.

Real-World Examples

-- Application search path
ALTER ROLE myapp SET search_path = pg_catalog, app, public;

-- Safely executing migrations
SET search_path = app;
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;

-- Using multiple schemas for versioning
SET search_path = v2, v1, public;
SELECT get_customer(5);

Casting & Type Conversion in PostgreSQL

Introduction

Casting (type conversion) converts a value from one data type to another.

PostgreSQL supports:
- explicit casts (written by the user)
- implicit casts (performed automatically)
- function-based casts (custom cast definitions)
- domain casts

PostgreSQL allows casting between:
- numeric types
- character types
- arrays
- JSON, XML
- composite & domain types
- user-defined types

Explicit Casts

SELECT '123'::INT;
SELECT CAST('123' AS INT);
SELECT CAST(5.8 AS NUMERIC(3,1));

Explicit casts always work the same way: the source is interpreted and validated for the target type.
If conversion is impossible, PostgreSQL raises an error:

SELECT 'abc'::INT;
-- ERROR: invalid input syntax for type integer: "abc"

The :: Cast Operator

The most common way to cast.
Fully equivalent to CAST(expr AS type).

SELECT now()::DATE;
SELECT 42::TEXT;
SELECT '{1,2,3}'::INT[];

CAST(expr AS type)

SELECT CAST(42 AS TEXT);

Useful when:
- you want readability in large SQL scripts
- you want to cast expressions containing : characters
- you write portable SQL (e.g., for other databases)

Implicit Casts

Implicit casts happen automatically when PostgreSQL can safely convert types.
Example: TEXT → INTEGER (if numeric):

SELECT 10 + '20';
-- '20' is implicitly cast to INT

Implicit casting rules depend on PostgreSQL’s internal type categories:
- numeric
- string
- date/time
- boolean
- bit string

Cast Precedence & Promotion Rules

When combining different numeric types:

SELECT 1 + 2.5;
-- 1 is promoted to NUMERIC

SELECT 1.0 + 2;
-- 2 becomes NUMERIC

General promotion path:

Source	Promoted To
smallint	integer
integer	bigint
bigint	numeric
real	double precision

String → other type is allowed only when unambiguous.

Casting JSON & JSONB

SELECT '{"a":1}'::json;
SELECT '{"a":1}'::jsonb;

SELECT ('{"n":42}'::jsonb)->>'n'::INT;

JSON numbers become TEXT when extracting with ->>, so cast manually.

SELECT (profile->>'age')::INT;

Casting Arrays

SELECT ARRAY['1','2','3']::INT[];

SELECT '{1,2,3}'::INT[];

SELECT ARRAY[1,2,3]::TEXT[];

Casting Between Enums and Text

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

SELECT 'happy'::mood;
SELECT 'happy'::mood::TEXT;

Enum → text always succeeds.
Text → enum succeeds only if the string matches an enum value.

Casting Composite Types

CREATE TYPE point2d AS (x INT, y INT);

SELECT '(3,4)'::point2d;
SELECT ROW(5,6)::point2d;

Casting requires matching field order and compatible types.

Domain Casts

CREATE DOMAIN positive_int INT CHECK (VALUE > 0);

SELECT 5::positive_int;
SELECT (-1)::positive_int; -- fails check constraint

Domains inherit casts from their base types.

Function-based Casts

You can define your own cast using a function.

CREATE FUNCTION text_to_int_safe(text)
RETURNS int
LANGUAGE plpgsql AS
$$
BEGIN
    RETURN CASE WHEN $1 ~ '^\d+$' THEN $1::int ELSE NULL END;
END;
$$;

CREATE CAST (text AS int)
WITH FUNCTION text_to_int_safe(text)
AS ASSIGNMENT;

AS ASSIGNMENT allows implicit casting when appropriate.
AS IMPLICIT creates a very permissive implicit cast (rarely recommended).

Create Your Own Type Casts

CREATE CAST (point2d AS text)
WITH FUNCTION point2d_to_text(point2d);

Casts can be:
- BINARY: memory representation is identical
- WITH FUNCTION: uses a conversion function
- WITHOUT FUNCTION: only allowed for built-ins

Safe Casting vs Unsafe Casting

SELECT '2024-01-01'::DATE; -- safe
SELECT 'not-a-date'::DATE; -- error

Avoid implicit casting from TEXT when parsing untrusted input.

SELECT '1; DROP TABLE users;'::INT;
-- ERROR (safe)

PostgreSQL does *not* execute arbitrary strings inside casts.

Type Casting in Expressions

SELECT (1 + '2')::INT;  -- implicit then explicit
SELECT (1 + 2.8)::NUMERIC(4,2);

Explicit cast always wins over PostgreSQL's automatic cast.

Real-World Examples

-- Read JSON number as integer
SELECT (data->>'count')::INT FROM metrics;

-- Convert text array to integer array
SELECT string_to_array('1,2,3', ',')::INT[];

-- Cast timestamp to date
SELECT created_at::DATE FROM orders;

-- Cast enum to text
SELECT status::TEXT FROM users;

-- Format a composite type
SELECT (ROW(5,6)::point2d).x;

Triggers in PostgreSQL

Introduction

A trigger is a special database object that automatically executes a function when a table event occurs.

Trigger events include:
- INSERT
- UPDATE
- DELETE
- TRUNCATE

Triggers can run:
- BEFORE the statement
- AFTER the statement
- INSTEAD OF the statement (for views)

Triggers call a trigger function (a function with RETURNS trigger), written in SQL or PL/pgSQL.

Types of Triggers

Type	Description
Row-Level Trigger	Executed once per affected row. Has access to `OLD` and `NEW` records.
Statement-Level Trigger	Executed once per SQL statement, regardless of how many rows are modified.
BEFORE Trigger	Can validate or modify `NEW` data before writing to the table.
AFTER Trigger	Runs after the change is committed to the table; cannot modify `NEW`.
INSTEAD OF Trigger	Replaces default behavior on views.

Trigger Function Basics

A trigger function must return RETURNS trigger.

For BEFORE row-level triggers:
- return NEW to continue
For BEFORE DELETE:
- return OLD
For AFTER triggers:
- the return value is ignored

CREATE OR REPLACE FUNCTION set_created_at()
RETURNS trigger AS
$$
BEGIN
    NEW.created_at := now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Creating a Trigger

CREATE TRIGGER set_timestamp
BEFORE INSERT ON users
FOR EACH ROW
EXECUTE FUNCTION set_created_at();

When inserting into users, created_at is automatically set.

Accessing OLD and NEW

NEW.column_name -- the new row (for INSERT/UPDATE)
OLD.column_name -- the old row (for UPDATE/DELETE)

INSERT: only NEW exists
UPDATE: both NEW and OLD
DELETE: only OLD

IF NEW.amount <= 0 THEN
    RAISE EXCEPTION 'Amount must be positive';
END IF;

BEFORE vs AFTER

Type	Usage
BEFORE	Validate data Modify `NEW` values Reject rows with `RAISE EXCEPTION`
AFTER	Audit logging Notification Complex cross-table updates

Row-Level vs Statement-Level

FOR EACH ROW       -- fires once per row
FOR EACH STATEMENT -- fires once per statement

Use statement triggers for:
- audit counters
- accumulation
- constraint-like logic

INSTEAD OF Triggers (for Views)

CREATE VIEW user_view AS
SELECT id, name FROM users;

Normally views are not writable unless PostgreSQL can auto-expand them.
INSTEAD OF triggers allow custom insert/update/delete behavior.

CREATE OR REPLACE FUNCTION user_view_insert()
RETURNS trigger AS
$$
BEGIN
    INSERT INTO users(name) VALUES(NEW.name);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER user_view_ins
INSTEAD OF INSERT ON user_view
FOR EACH ROW
EXECUTE FUNCTION user_view_insert();

Constraint Triggers vs Normal Triggers

Type	Characteristics
Normal Trigger	Fires immediately during statement execution
Constraint Trigger	Uses `DEFERRABLE` Can fire at end of transaction Useful for foreign-key-like logic

CREATE CONSTRAINT TRIGGER trg_name
AFTER INSERT ON orders
DEFERRABLE INITIALLY DEFERRED
FOR EACH ROW
EXECUTE FUNCTION check_order();

Transition Tables (Statement-Level AFTER Triggers)

PostgreSQL allows capturing the set of changed rows:

REFERENCING NEW TABLE AS newrows
REFERENCING OLD TABLE AS oldrows

CREATE TRIGGER bulk_update_audit
AFTER UPDATE ON accounts
REFERENCING OLD TABLE AS oldset NEW TABLE AS newset
FOR EACH STATEMENT
EXECUTE FUNCTION audit_bulk_update();

The function can run:

SELECT * FROM newset;
SELECT * FROM oldset;

Enabling and Disabling Triggers

ALTER TABLE users DISABLE TRIGGER set_timestamp;
ALTER TABLE users ENABLE TRIGGER set_timestamp;

ALTER TABLE users DISABLE TRIGGER ALL;
ALTER TABLE users ENABLE TRIGGER ALL;

Superusers may disable system triggers using ALTER TABLE ... DISABLE TRIGGER ALL.

Trigger Execution Order

If multiple triggers exist for the same event, PostgreSQL fires them in alphabetical order by trigger name.

CREATE TRIGGER a_before BEFORE INSERT ON t ...
CREATE TRIGGER z_before BEFORE INSERT ON t ...
-- a_before fires before z_before

Recursive Triggers & Avoiding Infinite Loops

CREATE OR REPLACE FUNCTION update_total()
RETURNS trigger AS
$$
BEGIN
    UPDATE accounts SET total = total + NEW.amount
    WHERE id = NEW.account_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

This can cause infinite recursion if accounts also has an UPDATE trigger.

-- Use WHEN clause to guard
CREATE TRIGGER safe_trigger
BEFORE UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.total IS DISTINCT FROM NEW.total)
EXECUTE FUNCTION update_total();

Performance Considerations

Triggers can slow down:
- large bulk inserts
- ETL pipelines
- high-throughput OLTP workloads

Prefer:
- statement-level triggers when possible
- batch processing instead of row-by-row logic
- avoiding heavy work inside triggers (use queues, NOTIFY/LISTEN)

AFTER triggers have consistent view of committed data.
BEFORE triggers must be fast since they block write operations.

Real-World Examples

1. Auto-updating updated_at

CREATE OR REPLACE FUNCTION touch_row()
RETURNS trigger AS
$$
BEGIN
    NEW.updated_at := now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER users_touch
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION touch_row();

2. Audit log table

CREATE OR REPLACE FUNCTION audit_insert()
RETURNS trigger AS
$$
BEGIN
    INSERT INTO audit(table_name, row_data, action, ts)
    VALUES(TG_TABLE_NAME, row_to_json(NEW), 'insert', now());
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER audit_users
AFTER INSERT ON users
FOR EACH ROW
EXECUTE FUNCTION audit_insert();

3. Automatically maintain a denormalized counter

CREATE OR REPLACE FUNCTION inc_post_count()
RETURNS trigger AS
$$
BEGIN
    UPDATE authors SET post_count = post_count + 1
    WHERE id = NEW.author_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_count_inc
AFTER INSERT ON posts
FOR EACH ROW
EXECUTE FUNCTION inc_post_count();

4. Prevent updates on archived rows

CREATE OR REPLACE FUNCTION prevent_update_archived()
RETURNS trigger AS
$$
BEGIN
    IF OLD.archived THEN
        RAISE EXCEPTION 'Cannot update archived row';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER no_update_archived
BEFORE UPDATE ON documents
FOR EACH ROW
EXECUTE FUNCTION prevent_update_archived();