MySQL Forums
Forum List  »  Newbie

Need help calculating a moving average using JOINED data sources
Posted by: Amanda J
Date: October 24, 2017 10:19PM

I have two ocean data sets (time series) in a MySQL DB:

Table A - Capture

cap_id| yr_c| sp_c| iso_c| area_c| qty_c | 5yr_avg_c |
     3| 2015|  TRR|    54|      8|   120 | 555.0     |
   678| 2015|  BOM|    62|     27|   0.0 | 30        |
    20| 2015|  TRR|    54|     27|   0.0 | 555.0     |
    45| 2015|  FRC|     7|     15| 86800 | 72000     |

Table B - Aquaculture

aqua_id| yr_a| sp_a| iso_a| area_a| qty_a | 5yr_avg_a |
     78| 2015|  OTS|    32|     27|  6868 | 7590
    333| 2015|  FRC|     7|     15|   550 | 634
    789| 2015|  TRR|    54|     27| 45000 | 47780
    987| 2015|  TRR|    32|     27|    40 | 47780

(n.b.: data for other years in the tables follows the same pattern)

I need to calculate total values and a moving average for a single species in a given year.

So first I combined Table A and B using a simulated FULL OUTER JOIN (left-right-union all) to generate Table C.

I then summed the qty_c and qty_a columns to get the global total (qty_t)

UPDATE global
SET qty_t = IFNULL(qty_c, 0) + IFNULL(qty_a, 0)
WHERE id > 0;

Table C - Global

id| cap_id| aqua_id| yr_c| yr_a| sp_c| sp_a| iso_c| iso_a|area_c|area_a| qty_c| qty_a | qty_t | 5yr_avg_t |

1 |     20|     789| 2015| 2015|  TRR|  TRR|    54|    54|    27|    27|   0.0| 45000 | 45000 | 
2 |     45|     333| 2015| 2015|  FRC|  FRC|     7|     7|    15|    15| 86800| 550   | 87350 |
3 |    678|    NULL| 2015| NULL|  BOM| NULL|    62|  NULL|    27|  NULL|   0.0| NULL  | 0.0   |
4 |      3|    NULL| 2015| NULL|  TRR| NULL|    54|  NULL|     8|  NULL|   120| NULL  | 120   |
5 |   NULL|      78| NULL| 2015| NULL|  OTS|  NULL|    32|  NULL|    27|  NULL| 6868  | 6868  |
6 |   NULL|     987| NULL| 2015| NULL|  TRR|  NULL|    32|  NULL|    27|  NULL| 40    | 40    |

Where I get bogged is in calculating the moving average

A year and species value exists for every record, but when suming across five year sub-sets for a given species the effect of the variability in the data is that every species may not have a concurrent 5 year set of quantity values.

To calculate the 5-yr moving average for Table A and Table B I used:

Table A:

SELECT
x.cap_id,
x.sp_c,
x.yr_c,
AVG(y.qty_c) AS `5yr_avg_c`
FROM capture AS x
JOIN capture AS y ON x.sp_c = y.sp_c
AND x.yr_c BETWEEN y.yr_c AND y.yr_c + 5
AND yr_c > 0
AND x.sp_c <> 'NULL'
GROUP BY x.sp_c, x.yr_c DESC;

Table B:

SELECT
x.aqua_id,
x.sp_a,
x.yr_a,
AVG(y.qty_a) AS `5yr_avg_a`
FROM aquaculture AS x
JOIN aquculture AS y ON x.sp_a = y.sp_a
AND x.yr_a BETWEEN y.yr_a AND y.yr_a + 5
AND yr_a > 0
AND x.sp_a <> 'NULL'
GROUP BY x.sp_a, x.yr_a DESC;

My challenge now is that the five year moving average query I used for Tables A and B does not work for Table C

Initially I substituted qty_t for qty_c and qty_a respectively in the two queries

But the moving average calculated for the qty_t is very different for the same species in the same year depending on if I use (yr_c and sp_c) or (yr_a and sp_a) columns in Table C.

So then I tried:

SELECT
x.id,
x.sp_c,
x.yr_c,
AVG(y.qty_t) AS `5yr_avg_t`
FROM global AS x
JOIN global AS y ON x.sp_c = y.sp_c
AND x.yr_c BETWEEN y.yr_c AND y.yr_c + 5
AND yr_c > 0
AND x.sp_a <> 'NULL'
WHERE x.yr_c = x.yr_a
OR x.yr_a IS NULL
GROUP BY x.sp_c, x.yr_c DESC;

This gets me an average for every Table A record in Table C consistent with the five year average calculated in Table A

To get the average for the remaining Table B records in Table C (but avoid calculating inconsistent averages) I thought I could use the query:

SELECT
x.id,
x.sp_c,
x.yr_c,
AVG(y.qty_t) AS `5yr_avg_t`
FROM global AS x
JOIN global AS y ON x.sp_c = y.sp_c
AND x.yr_c BETWEEN y.yr_c AND y.yr_c + 5
AND yr_c > 0
AND x.sp_c <> 'NULL'
WHERE x.yr_c <> x.yr_a
AND x.yr_a IS NOT NULL
AND x.yr_c IS NULL
GROUP BY x.sp_c, x.yr_c DESC;

But really not confident I have the right solution for the combined Table C data?

Or if a different table structure is what I need?

My other challenge is that I do not exactly understand the syntax "year_c > 0" and "year_a > 0"

Is it saying:

"if there is no quantity record for a given species in a given year in the five year sequence assume the catch value = 0.

Or is it saying:

"assume there is no catch value"?

I need to understand this because the demoninator matters.

If the average operation is dividing by 5 irrespective of whether there are 5 sequential years - the average will be lower.

If it is dividing by the number of years in the five year sequence where there is a year value (e.g. only 3 of the 5 years in a five year sequence have a quantity value) the 5 year average is actually a 3 year one - in which case the average will be higher.

Help very welcomed, Thanks

Options: ReplyQuote


Subject
Written By
Posted
Need help calculating a moving average using JOINED data sources
October 24, 2017 10:19PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.