Ok perfect :)
Peter Brawley Wrote:
-------------------------------------------------------
> All right, that script works. Now please clarify
> ...
>
> 1 Doubles and floats carry rounding errors. Why
> are you not using decimal type instead?
This was not a conscious decision on my part. I imported the data with a python script where the "act_gen_raw" table was created according to the script. This is however very good to know and I will keep that in mind for any further manipulaions. If a conversion for the new table is possible however, I would prefer that. I posted the script I used in here:
https://gitlab.ethz.ch/kaeserd/mysql-data.git
I think however for the time beeing the double type should be sufficient, as I am more interested in numbers with higher orders.
> 2 why is dt.txt a text column rather than a
> timestamp!?
I collected all the unique datetimes the "act_gen_raw" table and stored it there. As I saw that the data type was txt in the original I kept it that way for a better comparison. I would however prefer the data in the new table to be in the corresponding "date" and "time" data type which I have done for the "d" and "t" column. To summarize I thought that it makes sence to do where both data are in txt data. So that I can do this comparison:
where dt.text = act_gen_raw.DateTime
> 3 Your query ...
>
>
> select dt.d as Date, dt.t as Time,
> act_gen_raw.ActualGenerationOutput from
> plant_information
> inner join act_gen_raw on
> plant_information.PowerSystemResourceName =
> act_gen_raw.PowerSystemResourceName
> inner join dt on dt.txt = act_gen_raw.`DateTime`
> order by act_gen_raw.`DateTime`
>
>
> ... returns zero rows from this dataset because no
> rows match on either join. You'll need to rethink
> the WHERE clauses of the dumps.
Ok I see the problem. However I am not quite sure how to solve it using sql syntax... In my mind I need to solve something similiar like this pseudocode:
FOR row x in plant_information.PowerSystemResourceName :
(select dt.d as Date, dt.t as Time,
act_gen_raw.ActualGenerationOutput from
plant_information
Where x = act_gen_raw.PowerSystemResourceName &&
where act_gen_raw.DateTime in dt.txt)
However I am aware that loops are generally not a good idea in sql... So that is also kind of the reason why I am stuck...
> 4 Once you've done that, you'll need to specify
> exactly what result this query should produce from
> the dataset.
I want to produce the following table:
dt.d [date]| dt.t [time] | power_plants_raw.plant_information.PowerSystemResourceName.ROW_1 [double]| power_plants_raw.plant_information.PowerSystemResourceName.ROW_2 [double]| power_plants_raw.plant_information.PowerSystemResourceName.ROW_3 [double]| ..|power_plants_raw.plant_information.PowerSystemResourceName.ROW_LAST_ROW [double]
Where power_plants_raw.plant_information.PowerSystemResourceName.ROW_1 and so on correspondend to the each name listed if you run:
SELECT PowerSystemResourceName
FROM `power_plants_raw`.`plant_information`;
My [] correspond to the data type I wish to have in the columns. As already stated in 1) for the data type double a conversion which is most sensible I would do here.
Each data in the collumns of power_plants_raw.plant_information.PowerSystemResourceName.ROW_X should correspond to the data in act_gen_raw.ActualGenerationOutput where act_gen_raw.PowerSystemResourceName matches power_plants_raw.plant_information.PowerSystemResourceName.ROW_X. For each row in the new table the date from act_gen_raw.DateTime should match one of the dates stored in dt.
So in the end I would like to have a table such the header is:
Date [txt]| Time [txt]| power_plants_raw.plant_information.PowerSystemResourceName.ROW_1 [txt]| power_plants_raw.plant_information.PowerSystemResourceName.ROW_2 [txt]| power_plants_raw.plant_information.PowerSystemResourceName.ROW_3 [txt]| ..|power_plants_raw.plant_information.PowerSystemResourceName.ROW_LAST_ROW [txt]
and the rows are:
dt.d [Date] | dt.t [Time] | act_gen_raw.ActualGenerationOutput [double] | ... | act_gen_raw.ActualGenerationOutput [double]
where each entry for act_gen_raw.ActualGenerationOutput correspond to the specific date and act_gen_raw.PowerSystemResourceName in this row and column respectively.In the end it should look similiar to this:
Date Time Stalon G1 Miranda - G3 CSP_GT1
2017-08-05 12:00:00 118.9 21.4 59.4
2014-12-06 23:00:00 119.5 50 29.2
2017-08-05 12:15:00 120.5 0 0.1
I hope this clarifies a bit my goal. I would also like to thank you again for all your time :) I really appreciate it :D
best,
David