Ok, guess I should give up on fetching fewer than all of the rows in the second table. (Second in the EXPLAIN, that is.) So, off to Plan B...
Schema change, something like this:
ALTER TABLE webfiles
ADD COLUMN filename VARCHAR(255),
ADD INDEX(filename);
UPDATE webfiles
SET filename = SUBSTRING_INDEX(filepath, '/', -1);
SELECT ...
WHERE webfiles.filename= product_files.filename;
Never mind; that probably won't help (in your case).
Plan C:
The filehash is an MD5 (or similar), right? So it is UNIQUE? (Your index did not say so.) Well, assuming it is reasonably unique, and assuming that "merge index" does not apply in your case:
http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html
let's simulate index merge, sort of...
SELECT DISTINCT REPLACE(webfiles.filepath, x.filename, '') AS path
FROM webfiles,
(
(
SELECT filehash,
SUBSTRING_INDEX(webfiles.filepath, '/', -1) AS filename
FROM webfiles
UNION ALL
SELECT filehash,
filename
FROM product_files
)
GROUP BY filehash, filename
HAVING COUNT(*) = 2
) x
WHERE webfiles.filehash = x.filehash
AND webfiles.filename = x.filename;
Yikes! That turned out to be a mess. Here is what I was hoping to do (maybe it is not quite right.
1. Fetch stuff from each table -- this should be a fast, simple table scan (maybe index scan).
2. UNION ALL them, but keep only those that occurred in each table (assuming that neither table has dups of filehash and filename!).
3. Use those results (hopefully only one row?) to finish up the query with webfiles.
Optional: An index on product_files of (filehash, filename) would make that run a little faster by "Using index".
Reading 15K rows in a table scan is a lot faster than randomly reaching into the table 15K times. That's the secret sauce. But it is messy to write and understand.