Hi
If you're using MySQL 8, you can use RECURSIVE CTEs to "shred" the contents of your JSON document into rows, which you can then normally parse and filter.
Like this:
WITH RECURSIVE author_books (id, name, one_book, rest) AS
(
SELECT authors.id, authors.name, JSON_EXTRACT(authors.published_books, CONCAT('$.', JSON_EXTRACT(JSON_KEYS(authors.published_books), '$[0]'))), JSON_REMOVE(JSON_KEYS(authors.published_books), '$[0]')
FROM authors
UNION ALL
SELECT author_books.id, author_books.name, JSON_EXTRACT(published_books, CONCAT('$.', JSON_EXTRACT(rest, '$[0]'))), JSON_REMOVE(rest, '$[0]')
FROM author_books JOIN authors ON author_books.id = authors.id WHERE JSON_LENGTH(rest) > 0
)
SELECT * FROM author_books WHERE JSON_CONTAINS(one_book->'$.tags', '["Social"]') AND one_book->'$.language' = 'English'
The WITH (...) block creates a derived table that iterates each ISBN in the published_books document and extracts the corresponding book, one row per nested book. The whole shredded table is then filtered by the outermost SELECT using the conditions you wanted.
If I run it (I changed the ISBNs a little because they were duplicated):
mysql> WITH RECURSIVE author_books (id, name, one_book, rest) AS
-> (
-> SELECT authors.id, authors.name, JSON_EXTRACT(authors.published_books, CONCAT('$.', JSON_EXTRACT(JSON_KEYS(authors.published_books), '$[0]'))), JSON_REMOVE(JSON_KEYS(authors.published_books), '$[0]')
-> FROM authors
-> UNION ALL
-> SELECT author_books.id, author_books.name, JSON_EXTRACT(published_books, CONCAT('$.', JSON_EXTRACT(rest, '$[0]'))), JSON_REMOVE(rest, '$[0]')
-> FROM author_books JOIN authors ON author_books.id = authors.id WHERE JSON_LENGTH(rest) > 0
-> )
-> SELECT * FROM author_books ;
+------+-------+----------------------------------------------------------------------------------------------------------+--------------+
| id | name | one_book | rest |
+------+-------+----------------------------------------------------------------------------------------------------------+--------------+
| 1 | Tina | {"name": "Book 1", "tags": ["self Help", "Social"], "language": "English", "release_date": "2017-05-01"} | ["2e8b2470"] |
| 3 | Keith | {"name": "Book 6", "tags": ["Social", "Life"], "language": "English", "release_date": "2017-05-01"} | [] |
+------+-------+----------------------------------------------------------------------------------------------------------+--------------+
2 rows in set (0.00 sec)
You can probably move the filter to inside the CTE as an optimization, but I left it out in the hope that it's easier to understand.
Another possible optimization would be to remove the JOIN ON authors by returning published_books as a field in the first CTE query.
You can read more about CTEs here:
https://dev.mysql.com/doc/refman/8.0/en/with.html
Now, if you're using 5.7, you have a couple of options (that I can think of). The easiest is to fetch the whole JSON document from matching rows and leaving extraction of the individual books to the application.
If you really want to do the extraction at the server side, you can probably write a UDF that does something similar to the CTE, but with a more conventional FOR/WHILE loop.
Regards,
Alfredo
--
Alfredo Kojima
MySQL Developer Tools