MySQL Forums
Forum List  »  MySQL & Storage

Re: MySQL + NAS(NFS) - Possible solution?
Posted by: JinNo Kim
Date: September 23, 2005 08:16AM

Mr. Fisk/Any MySQL devel,

After your post, I started looking back at the documents I looked at before I decided I could probably make the space on the NetApps work for this application. I guess my confusion on the NFS/NetApps+MySQL issue stems from the incredible performance we get from our Oracle+Sun solution that hangs off of the same filer. In looking around at some of the Oracle+NetApps Filer docs on the net, I notice one that makes mention of directio() in conjunction with the Oracle+Solaris environment ( http://www.netapp.com/library/tr/3322.pdf - Sections 5.1.2 + 5.2.1 - this document also contains a decent comparison of local disk vs. NFS for Oracle DBMS using a NetApps FAS 8x0 or 9x0 filer).

I knew we ran Oracle from a Slowlaris (intentional typo) box with everything hanging off of the NetApps. I believe in my heart that MySQL is a superior product for most applications (there are some places where Oracle may be a better fit, I just can't really think of one). I know Linux is superior to Solaris in most of the scenarios I have encountered, especially the 2.6 kernels and newer.


So, I started looking at directio (O_DIRECT in Linux) to see if I might be able to identify the source of my pain...

In at least the newer 2.6 kernels (including the one we are running), there is a kernel config option of "CONFIG_NFS_DIRECTIO", marked as "experimental". The help for the option states:

"This option enables applications to perform uncached I/O on files in NFS file systems using the O_DIRECT open() flag. When O_DIRECT is set for a file, its data is not cached in the system's page cache. Data is moved to and from user-level application buffers directly. Unlike local disk-based file systems, NFS O_DIRECT has no alignment restrictions.

Unless your program is designed to use O_DIRECT properly, you are much better off allowing the NFS client to manage data caching for you. Misusing O_DIRECT can cause poor server performance or network storms. This kernel build option defaults OFF to avoid exposing system administrators unwittingly to a potentially hazardous feature.

For more details on NFS O_DIRECT, see fs/nfs/direct.c.

If unsure, say N. This reduces the size of the NFS client, and causes open() to return EINVAL if a file residing in NFS is opened with the O_DIRECT flag.

Symbol: NFS_DIRECTIO [=n]
│ Prompt: Allow direct I/O on NFS files (EXPERIMENTAL)
│ Defined at fs/Kconfig:1413
│ Depends on: NET && NFS_FS && EXPERIMENTAL
│ Location:
│ -> File systems
│ -> Network File Systems
│ -> NFS file system support (NFS_FS [=y]) "


The fs/nfs/direct.c file says:

/*
* linux/fs/nfs/direct.c
*
* Copyright (C) 2003 by Chuck Lever <cel@netapp.com>
*
* High-performance uncached I/O for the Linux NFS client
*
* There are important applications whose performance or correctness
* depends on uncached access to file data. Database clusters
* (multiple copies of the same instance running on separate hosts)
* implement their own cache coherency protocol that subsumes file
* system cache protocols. Applications that process datasets
* considerably larger than the client's memory do not always benefit
* from a local cache. A streaming video server, for instance, has no
* need to cache the contents of a file.
*
* When an application requests uncached I/O, all read and write requests
* are made directly to the server; data stored or fetched via these
* requests is not cached in the Linux page cache. The client does not
* correct unaligned requests from applications. All requested bytes are
* held on permanent storage before a direct write system call returns to
* an application.
*
* Solaris implements an uncached I/O facility called directio() that
* is used for backups and sequential I/O to very large files. Solaris
* also supports uncaching whole NFS partitions with "-o forcedirectio,"
* an undocumented mount option.
*
* Designed by Jeff Kimmel, Chuck Lever, and Trond Myklebust, with
* help from Andrew Morton.
*
* 18 Dec 2001 Initial implementation for 2.4 --cel
* 08 Jul 2002 Version for 2.4.19, with bug fixes --trondmy
* 08 Jun 2003 Port to 2.5 APIs --cel
* 31 Mar 2004 Handle direct I/O without VFS support --cel
* 15 Sep 2004 Parallel async reads --cel
*
*/


Later in the file (line numbers given):

721 * Note that O_APPEND is not supported for NFS direct writes, as there
722 * is no atomic O_APPEND write facility in the NFS protocol.
723 */


A grep of the MySQL 4.0 source tree (4.0.25-r2.ebuild being the Gentoo ebuild marked as "stable") only shows references to O_DIRECT in the innobase/ subdirectory. A check of 5.1 shows references only in innobase, bdb, ndb.

I can see where O_DIRECT support was added to Inno, but have yet to find where one would #DEFINE O_DIRECT so that the ifdef is true (may not be seeing the whole picture).

2003/07/13 00:16:42+03:00 heikki@hundin.mysql.fi +4 -0
Allow also O_DIRECT as innodb_flush_method; it only affects writing to data files

Just working from innobasee/os/os0file.c (4.0 sources):

#ifdef O_DIRECT
/* We let O_DIRECT only affect data files */
if (type != OS_LOG_FILE
&& srv_unix_file_flush_method == SRV_UNIX_O_DIRECT) {

/* fprintf(stderr, "Using O_DIRECT for file %s\n", name); */

create_flag = create_flag | O_DIRECT;
}
#endif

Now, taken together...

IFF I have defined O_DIRECT in my kernel configuration (implementation written by Network Appliance - cel@netapp.com)

AND I can convince MySQL to pass O_DIRECT to open() calls

AND none of the open calls I need require O_APPEND

To any developers out there:

Could this be the solution to my issue? Is there a compelling reason _not_ to pass O_DIRECT to table types other than inno, bdb, and ndb - specifically MyISAM when accessing the files through an NFS mount? Would it be better to approach this from an nfs-utils perspective, or within MySQL? Would it be possible to provide a configuration option or perform a check for fstype+my.cnf setting prior to performing an open? Am I way off track here?

I'd really like to understand _why_ this is happening. Non-scientific experimentation told me that Inno tolerated the NFS mount far better than MyISAM did. I guess I'm just looking for feedback prior to attempting to rip the hell out of the code attempting to add a feature that is useless, will have to be done again at the next major revision, and (given my programming skills) will probably be implemented poorly.

Begging for feedback and thanks for still looking at this. I look forward to any comments...

-JNK



Edited 1 time(s). Last edit at 09/23/2005 06:56PM by JinNo Kim.

Options: ReplyQuote


Subject
Written By
Posted
Re: MySQL + NAS(NFS) - Possible solution?
September 23, 2005 08:16AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.