Database Transaction Isolation Levels

Database Transaction Isolation Levels

Exploring different read issues and choosing appropriate isolation levels

In the world of application development, ensuring data integrity and reliability is paramount. One critical aspect of this is understanding database transactions and isolation levels. In this blog post, we'll delve into what a database transaction is, discuss various read issues possible, and explore the different isolation levels offered by databases to handle read issues during transactions.

Transaction

Let's start by understanding what a database transaction is. A transaction is a set of queries that need to be considered as a unit of work. Either all queries in the transaction are completed successfully, or everything fails (atomicity).

Isolation

In real-world distributed applications, multiple transactions can be executed concurrently in a database. The database isolation should ensure that when two or more transactions are executed at the same time, queries executed by one transaction remain unaffected by the other transaction's queries.

Before exploring different isolation levels, we will first understand potential read issues, so that we can relate the importance of different isolation levels.

Different Read Issues

To maintain data integrity and consistency, it's essential to understand the potential read issues that may arise during concurrent transactions.

Let's dive in to understand the different read issues with examples.

Example Schema

Assume there is a table of Sales as shown below

CREATE TABLE sales (
    pid integer not null,
    qty integer,
    price numeric,
    PRIMARY KEY (pid)
);

-- Initial values
INSERT INTO sales (pid, qty, price) VALUES (1, 20, 10), (2, 25, 10);

Dirty Read

  • Dirty reads occur when one transaction reads data that has been modified by another transaction but has not yet been committed.

  • It involves seeing the uncommitted values of other transactions, which can lead to data inconsistency and unreliability.

Example:

Assume, Client 1 starts a transaction to take a sales report, while Client 2 updates the sales quantity for product 1 concurrently.

-- Client 1
-- At time t0
-- start a transaction to take a sales report
> BEGIN;
BEGIN
-- Client 2
-- At time t0
-- start a transaction to update sales qty for pid 1
> BEGIN;
BEGIN

-- At time t1
-- Updates the quantity for product 1
> UPDATE sales SET qty = qty + 5 WHERE pid = 1;
UPDATE 1

-- Not committed yet
-- Client 1 (Contd)

-- At time t2
-- selects total price for all products
> SELECT pid, qty * price AS total FROM sales;
 pid | total
-----+-------
   1 |   250
   2 |   250
(2 rows)

-- Note the total for pid 1, it is 250 
-- because the quantity is updated to 25 by client 2
-- Client 1 read an Uncommited value (dirty read)
-- Client 2 (Contd)

-- At time t3
-- For some reasons, client 2 decides not to continue the update and rollsback the transaction
> ROLLBACK;
ROLLBACK

-- Now, the quantity of pid 1 is reversed back to 20

In this case, Client 1 has read the quantity as 25, which is not committed to the table and is not present now due to rollback. Dirty reads should be avoided to maintain data consistency.

Non-Repeatable Read

  • Non-repeatable reads occur when a transaction sees different values for the same row when queried multiple times in the same transaction.

  • It happens when data changes between read operations due to other concurrent transactions committing updates.

Example:

Assume, Client 1 starts a transaction to take a sales report, while Client 2 updates the sales quantity for product 1 concurrently.

-- Client 1
-- At time t0
-- start a transaction to take a sales report
> BEGIN;
BEGIN

-- At time t1
-- selects total price for all products
> SELECT pid, qty * price AS total FROM sales;
 pid | total
-----+-------
   1 |   200
   2 |   250
(2 rows)
-- Client 2
-- At time t1
-- start a transaction to update sales qty for pid 1
> BEGIN;
BEGIN

-- At time t2
-- Updates the quantity for product 1
> UPDATE sales SET qty = qty + 5 WHERE pid = 1;
UPDATE 1

-- At time t3
-- Commits the transaction
> COMMIT;
COMMIT

-- The quantity is updated to 25 and it is committed
-- Client 1 (Contd)

-- At time t4
-- Issues another query to select sum of all product to get total sales amount
> SELECT SUM(qty * price) AS total_sales FROM sales;
 total_sales
-------------
         500
(1 row)

-- The total sales is 500 as the quantity is updated to 25 by Client 2
-- The difference in quantity causes mismatch (450 and 500) in the sales report

In this case, The quantity is updated by Client 2 while Client 1 is in the process of generating a sales report. Client 1 has seen 2 different values (20 and 25) for the quantity, causing discrepancies in the report.

Note, Client 1 has read a committed value but still encountered an issue in the report due to repeated queries in the same transaction.

Phantom Reads

  • Phantom reads are a bit different from non-repeatable reads.

  • It occurs when a transaction executes multiple similar range queries, and another transaction inserts new records in the query range concurrently, causing the first transaction to see different data in subsequent queries.

Example:

Assume, Client 1 starts a transaction to take a sales report, while Client 2 inserts a new sales record for product 3 concurrently.

-- Client 1
-- At time t0
-- start a transaction to take a sales report
> BEGIN;
BEGIN

-- At time t1
-- selects total price for all products
> SELECT pid, qty * price AS total FROM sales;
 pid | total
-----+-------
   1 |   200
   2 |   250
(2 rows)
-- Client 2
-- At time t1
-- starts a transaction to add a new sales entry for product 3
> BEGIN;
BEGIN

-- At time t2
-- Inserts product 3 with quantity 50 and price 10
> INSERT INTO sales VALUES(3, 50, 10);
INSERT 0 1

-- At time t3
-- Commits the transaction
> COMMIT;
COMMIT

-- The new record is updated and it is committed
-- Client 1 (Contd)

-- At time t4
-- Issues another query to select sum of all product to get total sales amount
> SELECT SUM(qty * price) AS total_sales FROM sales;
 total_sales
-------------
         950
(1 row)

-- The total sales is 950 as new sales entry is added by Client 2
-- The difference in quantity causes mismatch (450 and 950) in the sale report

In this example, Client 1 has seen two different total sales (450 and 950) while generating a sales report, causing discrepancies in the report due to the new product inserted by Client 2.


Isolation levels

There are different levels of isolation offered by databases to effectively handle the read issues discussed above.

Let's dive in to understand various isolation levels and learn how these isolations help in preventing the above-discussed issues.

Read Uncommitted

  • Depending on the implementation, This level could be very efficient with respect to performance as there is no extra effort needed to maintain isolation.

  • However, it could cause all read issues discussed above - dirty reads, non-repeatable reads, and phantom reads.

Read Committed

  • This level allows a transaction to see only the committed values. Any uncommitted values will not be visible to all other transactions.

  • It is the default isolation level for most databases, and all the databases would be optimized for reading committed values.

  • However, it could cause non-repeatable reads and phantom reads.

Repeatable Read

  • This level prevents non-repeatable read issues by making sure that when a query reads a row, that row will remain unaffected by other ongoing transactions.

  • Each transaction will maintain its own copy of the row data being used in a transaction.

  • The row data will be updated once the transaction is committed.

  • A concurrent control mechanism needs to be maintained if two different transactions try to update the same row. One of the transactions must be rolled back.

  • Depending on the implementation, it could cause phantom reads.

Serializable

  • This is the highest isolation level but is less performant too.

  • It is the only isolation level that will guarantee to prevent phantom reads.

  • All the dependent transactions would be queued, and each transaction will be executed serially in order to maintain consistent updates between transactions.


Setting the isolation level

Most relational databases support the following isolation levels:

  • READ COMMITTED,

  • REPEATABLE READ, and

  • SERIALIZABLE.

Although READ UNCOMMITTED is supported by some databases (eg. Postgres), it is not recommended to use.

In Postgres, the default is READ COMMITTED. In MySQL / MariaDB, the default isolation level is REPEATABLE READ.

We can set the isolation level at the beginning of a transaction as shown below. For example, the below transaction runs in isolation level Repeatable Read

BEGIN;
-- to set REPEATABLE READ
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- queries here
COMMIT;

Please note that the availability of isolation levels and the syntax may vary depending on the database version and configuration. Additionally, some database systems have different default isolation levels, so it's essential to be aware of the default behavior of your chosen database system.


To conclude, understanding the various isolation levels and their implications is essential for designing robust and efficient database-driven applications. The appropriate isolation level should be carefully chosen based on the specific requirements of the application and the trade-offs between performance and data consistency.

By selecting the optimal isolation level, you can strike the right balance between ensuring data integrity and maintaining system performance. Remember, each isolation level offers distinct benefits and challenges, so it's vital to evaluate the needs of your application and the level of data isolation required.