Java-mysql highload application crash
Posted by: Dmitry Kislicyn
Date: January 17, 2012 06:40AM

I have a problem with my html-scraper. Html-scraper is multithreading application written on Java, by default it run with 128 threads. Shortly, it works as follows: it takes a site url from big text file, ping url and if it is accessible - parse site, find specific html blocks, save all url and blocks info including html code into corresponding tables in database and go to the next site. Database is mysql 5.1, there are 4 InnoDb tables and 4 views. Tables have numeric indexes for fields used in table joining. I also has a web-interface for browsing and searching parsed data (for searching I use Sphinx with delta indexes), written on CodeIgniter.

Server configuration:
CPU: Type Xeon Quad Core X3440 2.53GHz
RAM: 4 GB
HDD: 1TB SATA
OS: Ubuntu Server 10.04

Some mysql config:
key_buffer = 256M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 128
max_connections = 400
table_cache = 64
query_cache_limit = 2M
query_cache_size = 128M

When database was empty, scrapper process 18 urls in second and was stable enough. But after 2 weaks, when urls table contains 384929 records (~25% of all processed urls) and takes 8.2Gb, java application begun work very slowly and crash every 1-2 minutes. I guess the reason is mysql, that can not handle growing loading (parser, which perform 2+4*BLOCK_NUMBER queries every processed url; sphinx, which updating delta indexes every 10 minutes; I don't consider web-interface, because it's used by only one person), maybe it rebuild indexes very slowly? But mysql and scraper logs (which also contain all uncaught exceptions) are empty. What do you think about it?

Options: ReplyQuote


Subject
Written By
Posted
Java-mysql highload application crash
January 17, 2012 06:40AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.