Re: MyISAM using 'wrong' index
Hi Rick,
Sorry for my late response on this one, I never received an email about your reply.
Hereby the create statements:
CREATE TABLE `insertion_order_domain_size_viewability` (
`vie_Date` date NOT NULL COMMENT 'The date',
`vie_Provider` int(10) unsigned NOT NULL DEFAULT '1' COMMENT 'Provider enum',
`vie_AgencyId` int(10) NOT NULL COMMENT 'The agency ID',
`vie_BusinessUnitId` int(10) NOT NULL COMMENT 'The business unit ID',
`vie_AdvertiserId` int(10) NOT NULL COMMENT 'The advertiser ID',
`vie_InsertionOrderId` int(10) NOT NULL COMMENT 'The insertion order ID',
`vie_DomainId` int(10) NOT NULL COMMENT 'The domain ID',
`vie_SizeId` int(10) NOT NULL COMMENT 'The size ID',
`vie_Revenue` decimal(14,6) DEFAULT NULL COMMENT 'The revenue',
`vie_RevenueAdvCurrency` decimal(14,6) DEFAULT NULL COMMENT 'The revenue in advertiser currency',
`vie_RevenueInView` decimal(14,6) DEFAULT NULL COMMENT 'The revenue of the inview impressions',
`vie_RevenueInViewAdvCurrency` decimal(14,6) DEFAULT NULL COMMENT 'The revenue of the inview impressions in advertiser currency',
`vie_Cost` decimal(14,6) DEFAULT NULL COMMENT 'The cost',
`vie_CostAdvCurrency` decimal(14,6) NOT NULL DEFAULT '0.000000' COMMENT 'Cost in adv. currency',
`vie_AdvCurrency` varchar(3) NOT NULL COMMENT 'The advertiser currency',
`vie_Imps` bigint(20) unsigned DEFAULT NULL COMMENT 'Total number of imps',
`vie_ImpsInView` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps in view',
`vie_ImpsInviewBucket1_3` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps inview between 1 and 3 seconds',
`vie_ImpsInViewBucket3_5` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps inview between 3 and 5 seconds',
`vie_ImpsInViewBucket5_15` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps inview between 5 and 15 seconds',
`vie_ImpsInViewBucket15_30` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps inview between 15 and 30 seconds',
`vie_ImpsInViewBucket30` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps inview more then 30 seconds',
`vie_ImpsInViewTilBucket0_2` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps that we in view after between 0 and 2 seconds',
`vie_ImpsInViewTilBucket2_5` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps that we in view after between 2 and 5 seconds',
`vie_ImpsInViewTilBucket5` bigint(20) unsigned DEFAULT NULL COMMENT 'Number of imps that we in view after more then 5 seconds',
`vie_OriginalRevenue` decimal(14,6) NOT NULL COMMENT 'The original amount of revenue',
`vie_OriginalRevenueAdvCurrency` decimal(14,6) NOT NULL COMMENT 'The original amount of revenue in adv. currency',
`vie_NettoNettoRevenue` decimal(14,6) NOT NULL DEFAULT '0.000000' COMMENT 'Original AN rev',
`vie_NettoNettoRevenueAdvCurrency` decimal(14,6) NOT NULL DEFAULT '0.000000' COMMENT 'Original AN rev adv curr',
PRIMARY KEY (`vie_Date`,`vie_Provider`,`vie_AgencyId`,`vie_BusinessUnitId`,`vie_AdvertiserId`,`vie_InsertionOrderId`,`vie_DomainId`,`vie_SizeId`,`vie_AdvCurrency`) USING BTREE,
KEY `idx_AdvertiserId` (`vie_AdvertiserId`),
KEY `idx_viewability_io_date` (`vie_InsertionOrderId`,`vie_Date`,`vie_Provider`) USING BTREE,
KEY `idx_DomainId` (`vie_DomainId`),
KEY `idx_provider` (`vie_Provider`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='The viewability data per domain / size'
/*!50100 PARTITION BY HASH (month(vie_Date))
PARTITIONS 24 */
CREATE TABLE `ref_domain` (
`dom_Id` int(10) unsigned NOT NULL COMMENT 'Link to dom_domain',
`dom_DomainTypeId` int(10) unsigned NOT NULL COMMENT 'Link to dom_domain_type',
`dom_ParentId` int(10) unsigned DEFAULT NULL COMMENT 'Link to parent domain of this domain',
`dom_Url` varchar(255) NOT NULL COMMENT 'The domain url',
`dom_ContentCategoryId` int(10) DEFAULT NULL,
`dom_UrlParent` varchar(255) DEFAULT NULL,
`dom_UpdatedTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`dom_Id`) USING HASH,
KEY `idx_DomainTypeId` (`dom_DomainTypeId`) USING BTREE,
KEY `idx_ParentId` (`dom_ParentId`) USING BTREE,
KEY `idx_Domain` (`dom_Url`(10)),
KEY `idx_ContentCategoryId` (`dom_ContentCategoryId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Table containing reference data for domains'
Unfortunately I have to do a LEFT JOIN instead of INNER JOIN.
Most of the time data is selected using a single "insertion order id" and date range, hence the index: KEY `idx_InsertionOrder` (`rcd_InsertionOrderId`,`rcd_Date`)
l
For the following example, I used a smaller table than I refered to before (insertion_order_domain_size_viewability contains 8.6 million records, ref_domain 3.1 million).
explain ing the following query:
explain SELECT distinct vie_InsertionOrderId
FROM `insertion_order_domain_size_viewability`;
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'SIMPLE', 'insertion_order_domain_size_viewability', 'range', NULL, 'idx_viewability_io_date', '4', NULL, '1047', 'Using index for group-by'
Has the strange result that there's no possible key, but it does use the rigth index after all.
The actual select takes 4.4 sec to return 894 rows, which is quite slow in my opinion when it is able to use the index.
Now, going to join query:
SELECT SQL_NO_CACHE SUM(vie_Imps)
FROM `insertion_order_domain_size_viewability`
LEFT JOIN `ref_domain`
ON vie_DomainId = dom_Id
WHERE vie_InsertionOrderId = 198352
AND vie_Date BETWEEN '2015-06-23' AND '2015-06-29';
takes 0.562sec to complete.
The explain result:
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'SIMPLE', 'insertion_order_domain_size_viewability', 'range', 'PRIMARY,idx_viewability_io_date', 'PRIMARY', '3', NULL, '2', 'Using where'
'1', 'SIMPLE', 'ref_domain', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'bc-cockpit-reporting-34.insertion_order_domain_size_viewability.vie_DomainId', '1', 'Using index'
So the select on the large table should use index idx_viewability_io_date instead of the PK (in my opinion)
When I force the index idx_viewability_io_date, it completes wihtin 0.031 sec. The explain:
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'SIMPLE', 'insertion_order_domain_size_viewability', 'range', 'idx_viewability_io_date', 'idx_viewability_io_date', '7', NULL, '6', 'Using where'
'1', 'SIMPLE', 'ref_domain', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'bc-cockpit-reporting-3.insertion_order_domain_size_viewability.vie_DomainId', '1', 'Using index'
Even after running the ANALYZE it still used the PK. I also do daily an OPTIMIZE on the table, which should do the same I believe as analyze.
Regarding the percentage of data;
number of records
total --> 8,643,863 rows
WHERE vie_Date BETWEEN '2015-06-23' AND '2015-06-29' --> 151,188.
WHERE vie_InsertionOrderId = 198352 --> 40,408
combining both filters --> 2,077
Last, the show variables:
bulk_insert_buffer_size 8388608
innodb_buffer_pool_instances 1
innodb_buffer_pool_size 21474836480
innodb_change_buffering all
innodb_log_buffer_size 33554432
join_buffer_size 1048576
key_buffer_size 16106127360
myisam_sort_buffer_size 17179869184
net_buffer_length 16384
preload_buffer_size 32768
read_buffer_size 4194304
read_rnd_buffer_size 4194304
sort_buffer_size 524288
sql_buffer_result OFF
Please let me know when you need something else.
Best,
Danny