MYSQL JSON fragmentation
We are seeing a lot of storage fragmentation when appending to the same JSON column multiple times. From the documentation it's not clear this was the expected behavior so would like to understand what is wrong here.
The example to reproduce is fairly simple:
CREATE TABLE IF NOT EXISTS Documents (
id VARCHAR(255) PRIMARY KEY,
custom_data JSON);
Then you can run the following query many times (say 1000) for different JSON keys:
INSERT INTO Documents (id, custom_data) VALUES ('id', JSON_OBJECT('random_key', 'val')) ON DUPLICATE KEY UPDATE custom_data = IF(custom_data IS NOT NULL, JSON_SET(custom_data, CONCAT('$.random_key'), 'val'), JSON_OBJECT('random_key', 'val')), id = 'id';
The result after the queries is that the table uses 14MB
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 17 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 14680064 | 14680064 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
while the actual data is fairly small,
mysql> select OCTET_LENGTH(custom_data) from Documents;
+---------------------------+
| OCTET_LENGTH(custom_data) |
+---------------------------+
| 333020 |
+---------------------------+
Also running an optimize table reduces the size by a lot
mysql> optimize table Documents;
+--------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------+----------+----------+-------------------------------------------------------------------+
| document.documents | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| document.documents | optimize | status | OK |
+--------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.06 sec)
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 18 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 475136 | 475136 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
Thanks in advance for any help.
Subject
Views
Written By
Posted
MYSQL JSON fragmentation
551
January 12, 2023 07:47PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.