We hold files at a remote storage service (Google Storage).
I am going to store the list of all these files in a local MySQL DB. This is to query about filenames faster (without contacting Google's server).
Files are classified by specifying both a "service" (one of the root directories on the Google's server) and by Group ID (the direct subdirectory of a service).
Group ID determines grouping of files. For example we may need to remove all files sharing the same Group ID and the same service.
I have in mind two different ways of the database design. Please help to choose the best one:
1. One table which contains the service, file ID and the part of the file path which does not include the service.
2. Two tables: a. the table containing the Group ID and the corresponding subdirectory of a service; b. the table linked to the first table and also containing the ID and the partial path to the file (excluding the service and the part determined by Group ID).
Note that "2" does allow quick querying for a group of files sharing the same group ID because the filename field can be indexed (even UNIQUE-indexes) and thus searching by a prefix specifying the "group" part of the part is quick.
Please help me to evaluate pros and cons of "1" and "2" to decide which variant to choose.
One reason to choose "1" is that it probably requires less disk space and group path is not repeated for each file in the group (that is the set of files which share the same Group ID).
--
http://freesoft.portonvictor.org