I am open to using Load Data. But I am most definitely not an ETL programmer, so I was taking the easiest path.
The input member file I sent only had 4 data rows + one header row. But the error above lists a row 5. What am I missing there?
Prob'ly a trailing EOL here.
I noticed on the Server Status window in workbench that it says the "general" log file is turned off. Only the error one and the one for slow queries is enabled. Would they be in the general one if I had
No, errors will be written to the MySQL error log.
Then I am not getting those errors that you posted.
We get the files in as .txt then I import them in Excel in order to add the key column and the three or four columns to the far right.
The references are in other spreadsheet cells/pages?
The reference for the practice is in another spreadsheet. I use Xlookup to populate it and then copy/paste special values to get rid of the formula.
Then, I save as .csv from Excel. My assumption was that types would be "neutral" once saved as .csv and that MySQL would bring them in as the type of the column. I guess not.
Spreadsheet rules < RDBMS rules!
Should I be saving them as text instead? I have to be able to use excel to add the columns. One of the columns for the member file includes using xlookup to assign practice.
Again, the claims table DDL is outlandishly wrong for an RDBMS. If there are going to be queries on such a table, its major enumerating columns need to be broken out to child tables, more like this...
... and I've not even done full 3rd Normal Form, what else needs breaking out depends on how this table's data and its child data are to be queried.
Normally, this is done at input time, eg with a Trigger. With thousands of inout rows, probably the best solution for you would to load the Excel data into a MyISAM table; that'll solve the input speed problem, it'll be fast with Load Data
Infile. Then write a routine to populate the InnODB claims table and its InnoDB child tables from the input table. Then you have queryable data. With the table as it is, queries would be extremely difficult to write, debug and maintain (ie the rules of normalisation aren't cosmetic, they're intrinsic to RDBMS logic) and would perform like comatose pigs.
I know. This isn't meant to be a "proper" RDBMS. I am doing this in my "spare" time for work. I am not going to be deduping claims that were included in more than one file and I am not going to be taking into account claims that were reversed later and resubmitted.
I just need a way to find certain members that have certain types of claims. And I want to track members when they enroll, drop, re-enroll, etc. (all are medicaid members who have a month to month enrollment). We have a population health system where all this data plus data from EHRs is sent to, but the vendor is not very, not sure the right word, enlightened? They do not provide a report writing tool so that's why I am having to do this. And they did not foresee some of the things we need to lookup / track.
For example, from a different system, I will pull a list of all our diabetics and put that together with yet a different system that says which diabetics are not meeting a quality measure that we have contracted to be paid based on performance. A quality measure is something like for a diabetic they need to have a retinal eye exam. This other report tells us who has not. Using the subscriber IDs there, I will pull all the claims for each that will be helpful to determine where the breakdown in care is.
Another thing we want to look for is members that are seeing a PCP who is not one of our providers. If they are seeing one outside, the payer has committed to re-assigning them to a different PCP. So, just different one off types of look-sees. We get a pair of files from this particular payer each month. And there are two others that send files as well. Theirs are more complicated in that they send more files and you have to cross walk each one to find what you want. I haven't even started the source to target map for those.
Another use case is people with mental health issues that have an inpatient admit. We want to see who they are seeing for regular mental health checkins and if those are our providers.
When I query the claims data, I will not be pulling all columns.
I looked at the columns shown in the error messages:
COMPANY_CODE VARCHAR(5) - always equal to UHGNY
MemberSubscriberID VARCHAR(14) - largest is 121602544
MemberMI CHAR(1) - all blank or one character
MemberGender CHAR(1) - all M or F, no blanks
MemberDOB DATE - all dates formatted yyyy-mm-dd, no single quotes or quotes in input file
MemberState VARCHAR(6) - all values two characters
ProviderNPI VARCHAR(10) - largest is 1992972269
ProviderTIN VARCHAR(9) - largest is 510567466
EligStart DATE - all dates formatted yyyy-mm-dd, no single quotes or quotes in input file
EligEnd DATE - all dates formatted yyyy-mm-dd, no single quotes or quotes in input file
Many, perhaps all such errors can got round by surrounding output values with quotes.
Like I said, I did not get any errors when I imported the files using the table import wizard.
I've got a claims file importing now - 101K rows so far no errors. I've loaded two member files.
I will eventually teach myself how to load the files using load data. Hey, I'm using to having someone else do all the ETL, etc and then I just get to play with the data. I don't have that luxury at this job. Your help has been priceless.