> Here the ID 44 row should not be present as the b.cc>0 is selected to create the table.
So you say, but there it is.
And if it's present in a deliberately crafted test dataset, something like it is likely to turn up in production too.
For suppose this is the case then the requirement will be
if b.cc==0 or b.cc=m.cc then
else if b.cc <> 0 then
All right. I had the condition as b.cc=0 and
b.cc <> m.cc. I take it from the above it should be b.cc <> m.cc.
> So the row 53 will be updated as b.cc=m.cc (i.e after update we get b.cc=3) instead of its current value b.cc=1
my another confusion here is :
The master table is just the same records of base except it can have some variation in m.cc value.
for all the b.cc<>0 we make a temp table and from manual inspection we update or set the value of m.cc. That means if the b.title <> b.title_sci (also corresponding issn from base does not match ) then cc>0 in base so we analyse the title and title_sci in temp table and set the cc to be either 0 or 1 or 2 and push this in master as a repository,So for the next run we use this table to check instead of manual inspection for b.cc<> 0.
If it depends on human eyes and hands and brains, errors are inevitable. Unless you wish to proceed by guesswork, you need an airtight algorithm.
My point here is we have all the same rows and same columns for base and master expect cc values. So the join on (title,title_sci,issn1,issn2,issn,eissn ) should give the same number of rows as we have in base report. select count(*) from base_report give 145 rows. but the join from ur post gives few records.
We have some missing rows from the base_report.
You sent 105 base_report_gt_zero rows, 145 master_data rows. The differences are not restricted to missing rows. The following exclusion join from master to base on the columns you say must be identical ...
select m.master_data_id, m.cc, m.title, m.title_sci, m.issn1, m.issn2, m.issn, m.eissn
from master_data m
left join base_report_gt_zero b using (title_sci, title, issn1, issn2, issn, eissn, databas)
where b.cc is null
... returns 92 rows. The test dataset does not meet the assumptions you're working from.
So it seems we're here ...
(i) we've shown that a relatively simple query can do the job that started as a sproc design problem.
(ii) before that query can be finalised, you need to bulletproof the process that generates the base_report table that the query will use.
(iii) we've got way past the point where issues we're discussing are of any interest to forum readers, so I think this should go to private communication for here on in. You can reach me at pb AT artfulsoftware.com.