PostgreSQL is All You Need | Jiwei Yuan's Blog

Open Table of Contents

Introduction
Use PostgreSQL as a Document Store
Use PostgreSQL for Cache
PostgresSQL as Full-Text Search Engine
PostgreSQL as a Vector Database
PostgreSQL as a Cron Job Scheduler
PostgreSQL as Message Broker
References

Introduction

The 2024 Stack Overflow Developer Survey reports that PostgreSQL is used by 49% of developers, making it the most popular database for the second consecutive year. Its success can be attributed to advanced features, extensibility, and powerful performance. However, in my view, one of the most crucial factors is its open-source model, which is collaboratively driven by both academia and industry. This model is more open than those driven by a single company, fostering a vibrant and innovative community.

Thanks to this collaborative approach and extensibility, numerous extensions and plugins have been developed, enhancing PostgreSQL’s capabilities and making it a versatile database suitable different scenarios. The thriving ecosystem promises the necessary support and tools, empowering developers to build their applications conveniently. “The trend of ‘Using Postgres for Everything’ is no longer limited to a few elite teams but is becoming a mainstream best practice.”, see Vonng’s article Postgres is eating the database world.

This blog will dive in and explore how PostgreSQL can be used to replace other databases in suitable scenarios.

PostgreSQL is All You Need

Use PostgreSQL as a Document Store

Developer love using doucment store like MongoDB for its flexibility on JSON storage. However, PostgreSQL could be used for JSON Storage by using the JSONB data type. JSONB is a binary representation of JSON data, which allows for faster indexing and querying. Following is an example of how to use JSONB in PostgreSQL. For for detail you can check the document JSON Functions and Operators.

If you have a JSON document like this:

{
    "name": "John Doe",
    "age": 30,
    "email": "john.doe@gmail.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "state": "NY",
        "zip": "10001"
    },
    "phone": [
        {
            "type": "home",
            "number": "212-555-1234"
        },
        {
            "type": "work",
            "number": "646-555-1234"
        }
    ]
}

You could create a table including a JSONB column to store the JSON document.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB NOT NULL
);

-- Insert a JSON document
INSERT INTO users (data)
VALUES ('{"name": "John Doe", "age": 30, "email": " ..... }');

-- Query a JSON document
SELECT * FROM users
WHERE data->>'email' = 'john.doe@gmail.com'

-- Query the nested JSON document
SELECT * FROM users
WHERE data-> 'phone' @> '[{"number": "212-555-1234"}]'::JSONB;

-- Update a JSON document
UPDATE users
SET data = jsonb_set(data, '{email}', '"new-email@gmail.com"', true)
WHERE data->>'email' = 'john.doe@gmail.com'

Use PostgreSQL for Cache

PostgreSQL could be used as temporary cache by using the unlogged table, which prevent generating WAL logs. While unlogged tables don’t guarantee durability in case of a crash, they can be highly effective in specific scenarios where data persistence isn’t critical.To create an unlogged table, simply add the UNLOGGED keyword when you define the table. You could include a ttl column to manage cache expiration and use the pg_cron extension to schedule the jobs that periodically clean up the expired records. For evication policy, you could add a last_read_ts and read_count column to track the last read time and read count, respectively and write related stored procedures to evict the least used records.

CREATE UNLOGGED TABLE cache (
    id SERIAL PRIMARY KEY,
    key VARCHAR(255) NOT NULL,
    value JSONB NOT NULL /* or any other data type like */
    insert_ts TIMESTAMP CURRENT_TIMESTAMP, /* expiration policy */
    last_read_ts TIMESTAMP CURRENT_TIMESTAMP /* related with evication policy, not necessary */
    read_count INT DEFAULT 0 /* related with evication policy, not necessary */
);
CREATE INDEX idx_cache_key ON cache (key);

CREATE OR REPLACE PROCEDURE expire_rows (retention_period INTERVAL) AS
$$
BEGIN
    DELETE FROM cache
    WHERE inserted_at < NOW() - retention_period;

    COMMIT;
END;
$$ LANGUAGE plpgsql;

-- Clean up the expired records
CALL expire_rows('24 hours');

-- Schedule the job to clean up the expired records
SELECT cron.schedule('0 0 * * *', 'CALL expire_rows(''24 hours'')');

Unlogged table could improve the performance by 2-3 times due to the absence of WAL. For more performance detail, check the blog PostgreSQL Unlogged Tables - Look Ma, No WAL!.

PostgresSQL as Full-Text Search Engine

Full-text Search is a common requirement for many applications. For small and medium-sized applications, we can directly use the full-text search functionality provided by PostgreSQL, which can effectively reduce the system complexity than using external search engines like ElasticSearch or Solr. PostgreSQL has a built-in tsvector field type. A tsvector is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word.

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    search_vector TSVECTOR
);

-- Create a GIN index on the search_vector
CREATE INDEX idx_search_vector ON documents USING GIN(search_vector);

-- Create a trigger to update the search_vector
CREATE OR REPLACE FUNCTION documents_search_vector_trigger() RETURNS TRIGGER AS $$
BEGIN
    new.search_vector :=
        setweight(to_tsvector('english', coalesce(new.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(new.content, '')), 'B');
    RETURN new;
END;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON documents FOR EACH ROW EXECUTE FUNCTION documents_search_vector_trigger();

-- Insert a document
INSERT INTO documents (title, content)
VALUES ('Document 1', 'This is the content of document 1');

-- Query the documents
SELECT * FROM documents
WHERE search_vector @@ to_tsquery('english', 'keyword1 & keyword2');

-- Order the result by rank
SELECT id, title, ts_rank(search_vector, to_tsquery('english', 'keyword1 & keyword2')) AS rank
FROM documents
WHERE search_vector @@ to_tsquery('english', 'keyword1 & keyword2')
ORDER BY rank DESC;

PostgreSQL as a Vector Database

With rise of AI application, vector database is becoming more and more popular. PostgreSQL could be used as a vector database by using the pg_vector extension.

-- Enable the extension
CREATE EXTENSION pg_vector;

-- Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

-- Insert a vector
INSERT INTO items (embedding) VALUES ('[1.0, 2.0, 3.0]');

-- Query the nearest vectors by L2 distance
SELECT * FROM items
ORDER BY embedding <-> '[1.0, 2.0, 3.0]'::vector(3)

PostgreSQL as a Cron Job Scheduler

PostgreSQL could be used as a cron job scheduler by using the pg_cron extension. The schedule uses standard cron syntax. You might need use crontab.guru to validate your cron schedule.

 ┌───────────── min (0 - 59)
 │ ┌────────────── hour (0 - 23)
 │ │ ┌─────────────── day of month (1 - 31)
 │ │ │ ┌──────────────── month (1 - 12)
 │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to
 │ │ │ │ │                  Saturday, or use names; 7 is also Sunday)
 │ │ │ │ │
 │ │ │ │ │
 * * * * *

Following is an example of how to use the pg_cron extension to schedule a job. For more detail, you can check the pg_cron documentation

-- Install the pg_cron extension
CREATE EXTENSION pg_cron;

-- Schedule a job
SELECT cron.schedule(
  'daily_job', -- job name
  '0 0 * * *', -- cron schedule at 00:00 every day
  'DELETE FROM logs WHERE created_at < NOW() - INTERVAL ''1 month'''
);

-- List all the scheduled jobs
SELECT * FROM cron.job;

-- Unschedule a job
SELECT cron.unschedule('daily_job');

PostgreSQL as Message Broker

This part will discuss how to use PostgreSQL as a message broker.

--  Create a message broker table
CREATE TABLE meassage_broker (
    id SERIAL PRIMARY KEY,
    topic VARCHAR(255) NOT NULL,
    data JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
    status VARCHAR(255) DEFAULT 'PENDING' /* PENDING, PROCESSED, FAILED, SUCCESS */
);

-- Insert a message
INSERT INTO message_broker (topic, data) 
VALUES ('topic1', '{"key1": "value1"}');

-- Batch insert messages
INSERT INTO message_broker (topic, data) 
VALUES ('topic1', '{"key1": "value1"}'), ('topic2', '{"key2": "value2"}');

-- Select a unprocessed message
SELECT * FROM message_broker 
WHERE status = 'PENDING' ORDER BY created_at ASC LIMIT 1;

-- Batch select unprocessed messages
SELECT * FROM message_broker 
WHERE status = 'PENDING' ORDER BY created_at ASC LIMIT 10;

-- Update a message status
UPDATE message_broker 
SET status = 'PROCESSED', updated_at = CURRENT_TIMESTAMP 
WHERE id = 1;

-- Delete the processed messages
DELETE FROM message_broker 
WHERE status = 'PROCESSED' AND updated_at < CURRENT_TIMESTAMP - INTERVAL '1 day';

For the Delete query, we can use the pg_cron extension to schedule the job.

References

Postgres is eating the database world
PostgreSQL is Enough list various use case to replace other database.
You Don’t Need a Dedicated Cache Service - PostgreSQL as a Cache
2024 Stack Overflow Developer Survey