need help with DB design - trying to reduce duplication
Posted by: Chris Bassett
Date: January 28, 2014 01:00AM
I am in need of some help in designing a database with the future in mind... meaning that I want to create my database so that it can be maintained in the future as needed (ie. prune or remove information that is no longer needed).
I am in the process of designing a database for the company I work for that will be a shipment tracking system (where someone can look up a customer ID or an order number and get the tracking number associated with it, as well as additional information about the shipment).
What I'm having a problem with is how to associate the Shipping Addresses (the receipiants of the shipments) to the shipments themselves in the database. For the most part I have this figured out, and it's not a complicated.
I have a the following structure for the Contacts table and the Shipments table (I've only included the PK fields, since the other fields
aree not significant for this issue):
ContactID varchar(16) primary key
ShippingAddressID varchar(16) primary key
ContactID varchar(16) not null (FK)
ShipmentID bigint primary key
ShippingAddressID varchar(16) not null (FK)
(In theory, any contact could have an unlimited number of shipping addresses, to the limit of the storage on the databse server, but the concern is speed of retreval and queries when the ShippingAddress table starts to amass large numbers of rows.)
What I have a concern about is maintenance, but I want to try to build this thought process into my design so I don't have to worry about redesigning it later. My concern is that what happens when there is a large number of shipping addresses that are no longer used? Technically, once a shipment is shipped, the address information associated with it never really changes, but it is needed for archiving and reference purposes.
As indicated in my design summary above, the Shipments table references the ShippingAddressID. But how can I handle instances where the ShippingAddress may be removed? (I created the Contact and ShippingAddress tables so that a contact can have multiple shipping addresses).
I toyed with the idea of just having the application copy the shipping address information into similar fields located in the Shipments table, but that would be data duplication, which is frowned upon in database design best practices. I thought about doing that so that way, if a shipping address was removed from the ShippingAddress table, the information would still be present in the Shipments table, since once a shipment is shipped, the address information doesn't change, and that information is still needed for archival and reference purposes (possibly in the future).
I also thought about just hiding the shipping addresses (creating a boolean field that indicates whether the shipping address is visible to the application or not), but I am not sure that this would even be the correct answer, because the data would physically still be there, and the DB engine would still have to parse those rows anyway.
I'm just looking for ways that the addresses can be maintained, but still maintain integrity in the database, and minimize data duplication. I'm thinking the copying of the fields may be the only solution, especially since that data won't change after it's written to the DB (more than likely).
I'm trying to figure this out before I continue with implementing my design because this is the only area that has been giving me trouble.
How else can overcome this problem, or is my two ideas about the only thing I can do, aside from having two separate databases ?(a Master database and an Archive database....)
(Sorry about the length of the post, but I had to make sure that my scenario was clear.)
Edited 4 time(s). Last edit at 01/28/2014 01:07AM by Chris Bassett.