Selecting for Share and Update in PostgreSQL

Ref: http://shiroyasha.io/selecting-for-share-and-update-in-postgresql.html
https://docs.oracle.com/cd/E17952_01/mysql-5.1-en/innodb-locking-reads.html
A regular select statement does not give you enough protection if you want to query data and make a change in the database related to it. Other transactions can update or delete the data you just queried. PostgreSQL offers additional select statements that lock on read and provide an extra layer of safety.

This article explores the select for share and select for update statements, locks that are created with these statements, and provide examples for using these two select statements.

Safely Updating Data

Sometimes, applications read data from the database, process the data, and save the result back in the database. This is a classic example where the select for update can provide additional safety.

Let’s consider the following example:

BEGIN;
SELECT * FROM purchases WHERE processed = false;
-- * application is now processing the purchases *

UPDATE purchases SET ...;
COMMIT;

The above code snippet can be a victim of a nasty race condition. The problem is that some other part of the application can update the unprocessed data. Changes to those rows will be then overwritten when the data processing finishes.

Here is an example scenario in which the data suffers from an intrusive race condition.

process A: SELECT * FROM purchases WHERE processed = false;
--- process B updates the data while process A is processing it
process B: SELECT * FROM purchases;
process B: UPDATE purchases SET ...;
process A: UPDATE purchases SET ...;

To mitigate this issue, we can select the data for updating. Here is an example how we would do it:

BEGIN;
SELECT * FROM purchases WHERE processed = false FOR UPDATE;
-- * application is now processing the purchases *

UPDATE purchases SET ...;
COMMIT;

The select ... for update acquires a ROW SHARE LOCK on a table. This lock conflicts with the EXCLUSIVE lock needed for an update statement, and prevents any changes that could happen concurrently.

process A: SELECT * FROM purchases WHERE processed = false FOR UPDATE;
process B: SELECT * FROM purchases FOR UDPATE;
--- process B blocks blocks and waits process A to finish

process A: UPDATE purchases SET ...;
process B: UPDATE purchases SET ...;

All the locks will be released when the transaction ends.

Non-blocking Select for Update Statements

When the applications selects some rows for update, other processes are forced to wait for the transaction to end before they can get a hold of that lock.

If the processing takes too long to complete, for whatever reason, other parts of the system might be blocked. This can be undesirable. We can use the select ... for update nowait statement to prevent blocking calls to our database. This query will error out if the rows are not available for selection.

process A: SELECT * FROM purchases WHERE processed = false;
--- process B tries to select the data, but fails
process B: SELECT * FROM purchases FOR UPDATE NOWAIT;
process B: ERROR could not obtain lock on row in relation "purchases"
process A: UPDATE purchases SET ...;

Processing Non-Locked Database Rows

Select for update can be a rigid lock on your table. Concurrent processes can be blocked and starved out. Waiting is the slowest form of concurrent processing. If only one CPU can be active at a time, it is pointless to scale your servers. For this purpose, in PostgreSQL there is a mechanism for selecting only rows that are not locked.

The select ... for update skip locked is a statement that allows you to query rows that have no locks. Let’s observe the following scenario to grasp its use case:

process A: SELECT * FROM purchases
process A:   WHERE processed = false FOR UPDATE SKIP LOCKED;
process B: SELECT * FROM purchases
process B:   WHERE created_at < now()::date - interval '1w';
process B:   FOR UPDATE SKIP LOCKED;
-- process A selects and locks all unprocess rows
-- process B selects all non locked purchases older than a week

process A: UPDATE purchases SET ...;
process B: UPDATE purchases SET ...;

Both Process A and Process B can process data concurrently.

The Effect of Select For Update on Foreign Keys

One thing that we need to keep in mind while working with select for update statements is its effect on foreign keys. More specifically, we can’t forget that the referenced rows from other tables are also locked.

Let’s look at an example with two tables — users and purchases — with the notion that users have many purchases.

\d purchases
              Table "public.purchases"
 Column  |  Type   | Collation | Nullable | Default
---------+---------+-----------+----------+---------
 id      | integer |           |          |
 payload | jsonb   |           |          |
 user_id | integer |           |          |
Foreign-key constraints:
    "purchases_user_id_fkey" FOREIGN KEY (user_id) REFERENCES
    users(id) ON UPDATE CASCADE ON DELETE CASCADE

When selecting data from the purchases table with select for update, users will be locked as well. This is necessary because otherwise there is a chance of breaking the foreign-key constraint.

process A: SELECT * FROM purchases FOR UPDATE;
process B: UPDATE users SET id = 3 WHERE id = 1;
-- process B is blocked and is waiting for process A to finish
-- its transaction

In bigger systems, a select for share can have huge negative consequences if it locks a widely used table. Keep in mind that other processes will only need to wait if they want to update the referenced field. If the other process wants to update some unrelated data, no blocking will occur.

process A: SELECT * FROM purchases FOR UPDATE;
process B: UPDATE users SET name = 'Peter' WHERE id = 1;
-- process B is completed without blocking because it does not change
-- the id field

Safely Creating Related Records With Select for Share

A weaker form of select for update is the select for share query. It is an ideal for ensuring referential integrity when creating child records for a parent.

Let’s use the users and purchases tables to demonstrate a use case for the select for share query. Suppose that we want to create a new purchase for a user. First, we would select the user from the database and then insert a new record in the purchases database. Can we safely insert a new purchase into the database? With a regular select statement we can’t. Other processes could delete the user in the moments between selecting the user and inserting the purchase.

One way to avoid potential issues is to query for the user with the FOR SHARE locking clause.

process A: BEGIN;
process A: SELECT * FROM users WHERE id = 1 FOR SHARE;
process B: DELETE FROM users WHERE id = 1;
-- process B blocks and must wait for process A to finish

process A: INSERT INTO purchases (id, user_id) VALUES (1, 1);
process A: COMMIT;
-- process B now unblocks and deletes the user

Select for share prevented other processes from deleting the user, but does not prevent concurrent processes from selecting users. This is the major difference between select for share and select for update.

The select for share prevents updates and deletes of rows, but doesn’t prevent other processes from acquiring a select for share. On the other hand, select for update also blocks updates and deletes, but it also prevents other processes from acquiring a select for update lock.

The Select For No Key Updates and Select For Key Share

There are two more locking clauses in PostgreSQL introduces from version 9.3. The select for no key updates and select for key share.

The select for no key updates behaves similarly to the select for update locking clause but it does not block the select for share. It is ideal if you are performing processing on the rows but don’t want to block the creation of child records.

The select key share is the weakest form of the with lock clause, and behaves similarly to the select for share locking clause. It prevents the deletion of the rows, but unlike select for share it does not prevent updates to the rows that do not modify key values.

Search This Blog

Keep Learning, Keep Growing