Re: Join by date range unacceptably slow
Hi Øystein,
For my specific issue regarding associating overlapping time intervals, I have developed a work-around, but it is based on the assumption that a particular interval can be treated as a unique event.
In my case, I am comparing the start and end times of a video being displayed with the start and end times that zero or more viewers are looking at the screen where the video is playing. The assumption I am able to make is that more than 90% of the videos run for 15 seconds, with the remainder running longer, in one or more 15 minute increments, up to 2 1/2 minutes. The second assumption is that each video begins near a 15 second boundary, i.e. the 0, 15, 30 or 45 second value. I then give each video interval record a normalized starting period id, based on a unix timestamp value divided by 15:
cast(round(unix_timestamp(start_time)/15) as unsigned)
I then give the record a value for the number of 15 second periods of duration:
cast(ceiling((unix_timestamp(end_time) - unix_timestamp(start_time))/15) as unsigned)
I do similar computations for the viewer records, placing them in a temp table, and create additional records with period+1,...,period+n when the viewing time exceeds 15 seconds.
Next, I create an association table where the viewing period matches the video initial period. I add more association records when the video exceeds 1 period for viewer period = video period+1, etc.
Lastly, I remove duplications where the same viewer watches more than one period of a given video that exceeds 15 seconds.
I put all of the above in a stored procedure.
For a sample of about 200k video records and 90k viewer records, which generated about 2.2m associations, the process ran in under 10 minutes, much better than the hours taken for the interval start/end comparison query.