Veritas File System (VxFS) tuning

Preamble

In addition of Veritas Volume Manager (VxVM) Symantec is also proposing Veritas File System (VxFS) that is most of the time used in combination of VxVM. Symantec claim that highest benefit is found when using both in parallel. This document has been written using Red Hat Enterprise Linux Server release 5.5 (Tikanga) and below VxFS/VxVM releases:

[root@server1 ~]# rpm -qa | grep -i vrts
VRTSvlic-3.02.51.010-0
VRTSfssdk-5.1.100.000-SP1_GA_RHEL5
VRTSob-3.4.289-0
VRTSsfmh-3.1.429.0-0
VRTSspt-5.5.000.005-GA
VRTSlvmconv-5.1.100.000-SP1_RHEL5
VRTSodm-5.1.100.000-SP1_GA_RHEL5
VRTSvxvm-5.1.101.100-SP1RP1P1_RHEL5
VRTSvxfs-5.1.100.000-SP1_GA_RHEL5
VRTSatClient-5.0.32.0-0
VRTSaslapm-5.1.100.000-SP1_RHEL5
VRTSatServer-5.0.32.0-0
VRTSperl-5.10.0.7-RHEL5.3
VRTSdbed-5.1.100.000-SP1_RHEL5

To avoid putting real server name in your document think of something like:

export PS1="[\u@server1 \W]# "

VxFS file system physical parameters

When creating a file system there are two important characteristics to choose:

Block size (cannot be changed once the file system has been created)
Intent log size (can be changed after file system creation with fsadm, VxFS usually performs better with larger log sizes)

Remark:
When using VxVM with VxFS Symantec recommend usage of vxresize (instead of vxassist and fsadm) to club volume and filesystem shrink or grow.

Block size

From man mkfs_vxfs command (-o bsize=bsize):

File system size Default block size
————— ——————
0 TB to 1 TB 1k
>1 TB 8k

Similarly, the block size determines the maximum possible file system size, as given on the following table:

Block size Maximum file system size
———- ————————
1k 32 TB
2k 64 TB
4k 128 TB
8k 256 TB

Recommended Oracle file systems block size (assuming your Oracle database have block size equal or bigger than 8KB which is :

File System	Block Size
Oracle software and dump/diagnostic directories	1KB
Redo log directory	512 bytes for Solaris, AIX, Windows, Linux and 1KB for HP-UX
Archived log directory	1KB
Control files directory	8KB (control files block size is 16KB starting with Oracle 10g)
Data, index, undo, system/sysaux and temporary directories	8KB

You can check control file block size with (Linux RedHat 5.5 and Oracle 11.2.0.3):

SQL> select cfbsz from x$kcccf;

     CFBSZ
----------
     16384
     16384

Remark:
For Oracle release lower than 10g control files block size was equal to Oracle initialization parameter db_block_size, starting with 10g their block size is now 16KB whatever value of db_block_size.

You can check redo log block size with (Linux RedHat 5.5 and Oracle 11.2.0.3):

SQL> select lebsz from x$kccle;

     LEBSZ
----------
       512
       512
       512

Remark:
As 4KB block size disk are slowly coming on the market, Oracle 11gR2 now offer the capability to create redo log files with the size you like…

Once the fylesystem has already been created the command to see what block size has been chosen is fstyp. On HPUX 11iv3 (11.31) the parameter to look at is f_frsize:

server1{root}# fstyp -v /dev/vg1d1c/lvol3
vxfs
version: 6
f_bsize: 8192
f_frsize: 1024
f_blocks: 8388608
f_bfree: 2126170
f_bavail: 1993285
f_files: 576852
f_ffree: 531540
f_favail: 531540
f_fsid: 1073872899
f_basetype: vxfs
f_namemax: 254
f_magic: a501fcf5
f_featurebits: 0
f_flag: 0
f_fsindex: 9
f_size: 8388608

On Linux (RedHat 6.3) the parameter to look at is bsize:

[root@server2 ~]# fstyp -v /dev/vx/dsk/vgp3316/lvol10
vxfs
magic a501fcf5  version 9  ctime Wed 05 Feb 2014 12:18:11 PM CET
logstart 0  logend 0
bsize  1024 size  205520896 dsize  205520896  ninode 0  nau 0
defiextsize 0  ilbsize 0  immedlen 96  ndaddr 10
aufirst 0  emap 0  imap 0  iextop 0  istart 0
bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0
nindir 2048  aulen 32768  auimlen 0  auemlen 8
auilen 0  aupad 0  aublocks 32768  maxtier 15
inopb 4  inopau 0  ndiripau 0  iaddrlen 8   bshift 10
inoshift 2  bmask fffffc00  boffmask 3ff  checksum f7f795e4
oltext1 32  oltext2 1282  oltsize 1  checksum2 0
free 192386052  ifree 0
efree  2 1 2 3 4 3 3 1 4 5 3 5 5 4 2 14 19 10 6 2 6 6 2 1 1 0 2 0 0 0 0 0

Remark:
If from fragmentation point of view having bigger filesystem block size is obvious from performance perspective it is no so straight forward. See Analyzing the impact of the Vxfs filesystem block size on Oracle article in reference section that is almost breaking what I had in mind since multiple years…

Intent log size

From man mkfs_vxfs command (-o logsize=n):

Block size Minimum log size Maximum log size
———- —————- —————-
1k 256 blocks 262,144 blocks
2k 128 blocks 131,072 blocks
4k 64 blocks 65,536 blocks
8k 32 blocks 32,768 blocks

The default log size increases with the file system size, as shown on the following table:

File system size Default log size
—————- —————-
0 MB to 8 MB 256k
8 MB to 512 MB 1 MB
512 MB to 16 GB 16 MB
16 GB to 512 GB 64 MB
512+ GB 256 MB

Intent log size:

[root@server1 ~]# /opt/VRTS/bin/fsadm -t vxfs -L /ora_prisma/rbs
UX:vxfs fsadm: INFO: V-3-25669:  logsize=16384 blocks, logvol=""

Remark:
if the fsadm command complains for something like:

fsadm: Wrong argument "-t". (see: fsadm --help)

then look for VxFS binaries in /opt/VRTS/bin directory. Be careful if changing the PATH because simple tool like df will not behave the same as Symantec has re-written it.

So in my 6GB file system example example:

[root@server1 ~]# df -P /ora_prisma/rbs
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp1417/lvol9   6291456   5264042    963208      85% /ora_prisma/rbs

The default block size (1KB) has been chosen and so default intent log size of 16384 blocks i.e. 16MB.

So which intent log size to choose ? Symantec say recovery time increae with larger intent log while VxFS performs better with larger intent log size. As you obliviously when to tune for the 99.99% of time when your system is up and running you should consider creating large intent log size keeping in mind that behavior must be controlled while application is running (no clear Oracle recommendation)…

File extents

Same as Oracle table extent you can change default extent allocation policy and/or preallocate space to a file:

[root@server1 prisma]# getext undotbs01.dbf
undotbs01.dbf:  Bsize  1024  Reserve       0  Extent Size       0

Remark:
An extent size of 0 use default extent allocation. See vxtunefs for policy description (parameters are initial_extent_size and max_seqio_extent_size).

Small exemple with an empty file:

[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  28377085  52039685      36% /ora_iedbre/data
[root@server1 data]# touch yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve       0  Extent Size       0
[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  28377085  52039684      36% /ora_iedbre/data

Now changing its extend and initial allocation:

[root@server1 data]# setext -t vxfs -r 30g -f chgsize yannick
[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  59834365  22548484      73% /ora_iedbre/data
[root@server1 data]# ll yannick
-rw-r----- 1 root root 32212254720 Jun 22 14:45 yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve 31457280  Extent Size       0
[root@server1 data]# setext -t vxfs -e 1g -r 30g  yannick
[root@server1 data]# ll yannick
-rw-r----- 1 root root 32212254720 Jul  4 12:55 yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve 31457280  Extent Size 1048576

Please note it takes a bit of time to recover free space when deleting this test file.

Fixed extent sizes and Oracle ? I would say it is beneficial for Oracle datafiles as it avoids fragmentation, but if like me you work with autoextend feature then do not set a too small next extent and you would achieve same behavior.

VxFS file system tuning

Tunable filesystem parameters

[root@server1 ~]# vxtunefs -p /ora_prisma/rbs
Filesystem i/o parameters for /ora_prisma/rbs
read_pref_io = 65536
read_nstream = 1
read_unit_io = 65536
write_pref_io = 65536
write_nstream = 1
write_unit_io = 65536
pref_strength = 10
buf_breakup_size = 1048576
discovered_direct_iosz = 262144
max_direct_iosz = 1048576
default_indir_size = 8192
odm_cache_enable = 0
write_throttle = 0
max_diskq = 1048576
initial_extent_size = 8
max_seqio_extent_size = 2048
max_buf_data_size = 8192
hsm_write_prealloc = 0
read_ahead = 1
inode_aging_size = 0
inode_aging_count = 0
fcl_maxalloc = 195225600
fcl_keeptime = 0
fcl_winterval = 3600
fcl_ointerval = 600
oltp_load = 0
delicache_enable = 1
thin_friendly_alloc = 0

If file systems are used with VxVM it is suggested to let default value so do test when changing…

Remark:
When using VxFS with VxVM, VxVM by default breaks up I/O requests larger than 256K.

File system fragmentation

To display it issue:

[root@server1 ~]# /opt/VRTS/bin/fsadm -t vxfs -D /ora_prisma/data
 
  Directory Fragmentation Report
             Dirs        Total      Immed    Immeds   Dirs to   Blocks to
             Searched    Blocks     Dirs     to Add   Reduce    Reduce
  total             3         1         2         0         0           0

Remark:
Symantec do recommend to perform regular file system defragmentation (!!):

In general, VxFS works best if the percentage of free space in the file system does not get below 10 percent. This is because file systems with 10 percent or more free space have less fragmentation and better extent allocation. Regular use of Veritas df command (not the default OS df) to monitor free space is desirable (man df_vxfs).

[root@server1 ~]# /opt/VRTS/bin/df -o s /ora_prisma/data/
/ora_prisma/data   (/dev/vx/dsk/vgp1417/lvol8):  14065752 blocks  1875431 files
Free Extents by Size
          1:          2            2:          2            4:          1
          8:          1           16:          1           32:          0
         64:          0          128:          1          256:          1
        512:          1         1024:          1         2048:          0
       4096:          1         8192:          1        16384:          1
      32768:          0        65536:          0       131072:          1
     262144:          0       524288:          0      1048576:          1
    2097152:          1      4194304:          1      8388608:          0
   16777216:          0     33554432:          0     67108864:          0
  134217728:          0    268435456:          0    536870912:          0
 1073741824:          0   2147483648:          0

An unfragmented file system has the following characteristics:

Less than 1 percent of free space in extents of less than 8 blocks in length

Less than 5 percent of free space in extents of less than 64 blocks in length

More than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks

A badly fragmented file system has one or more of the following characteristics:

Greater than 5 percent of free space in extents of less than 8 blocks in length

More than 50 percent of free space in extents of less than 64 blocks in length

Less than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks

Mount options

Suggested mount options for Oracle databases:

File System	Normal Mount Options (VxFS)	Advanced Mount Options (VxFS)
Oracle software and dump/diagnostic directories	delaylog,datainlog,nolargefiles	delaylog,nodatainlog,nolargefiles
Redo log directory	delaylog,datainlog,largefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,largefiles
Archived log directory	delaylog,datainlog,nolargefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,nolargefiles
Control files directory	delaylog,datainlog,nolargefiles	delaylog,datainlog,nolargefiles
Data, index, undo, system/sysaux and temporary directories	delaylog,datainlog,largefiles	delaylog,nodatainlog,convosync=direct,mincache=direct,largefiles

Remark:
Licensed product Concurrent I/O (CIO) should also be considered when looking for I/O performance and running an Oracle database.

Database accelerators

Veritas Extension for Oracle Disk Manager

I found this in Veritas File System Administrator’s Guide and more deeply in Veritas Storage Foundation: Storage and Availability Management for Oracle Databases and it’s like re-discovering hot water. What is this Oracle Disk Manager (ODM) ?

From Symantec documentation:

The benefits of using Oracle Disk Manager are as follows:

True kernel asynchronous I/O for files and raw devices

Reduced system call overhead

Improved file system layout by preallocating contiguous files on a VxFS file system

Performance on file system files that is equivalent to raw devices

Transparent to users

Oracle Disk Manager improves database I/O performance to VxFS file systems by:

Supporting kernel asynchronous I/O

Supporting direct I/O and avoiding double buffering

Avoiding kernel write locks on database files

Supporting many concurrent I/Os in one system call

Avoiding duplicate opening of files per Oracle instance

Allocating contiguous datafiles

From Oracle documentation:

Oracle has developed a new disk and file management API called odmlib, which is marketed under the feature Oracle Disk Manager (ODM). ODM is fundamentally a file management and I/O interface that allows DBAs to manage larger and more complex databases, whilst maintaining the total cost of ownership.

Oracle Disk Manager (ODM) is packaged as part of Oracle9i and above; however, you’ll need a third party vendor’s ODM driver to fully implement Oracle’s interface. For example, Veritas’ VRTSodm package (in Database Edition V3.5) provides an ODM library. Other vendors such as HP and Network Appliance (DAFS) have also announced support and integration of ODM.

A bit of history can be found in this Veritas slide:

vxfs1

Remark:
ODM is an integrated solution and is considered as replacement of Quick I/O.

Let’s confirm option is available and usable:

[root@server1 ~]# rpm -qa | grep VRTSodm
VRTSodm-5.1.100.000-SP1_GA_RHEL5
[root@server1 ~]# /sbin/vxlictest -n "VERITAS Database Edition for Oracle" -f "ODM"
ODM feature is licensed
[root@server1 ~]# /opt/VRTS/bin/vxlicrep | grep ODM
   ODM                                 = Enabled
[root@server1 ~]# lsmod | grep odm
vxodm                 164224  1
fdd                    83552  2 vxodm
[root@server1 ~]# ll /dev/odm
total 0
-rw-rw-rw- 1 root root 0 Jul  3 17:27 cluster
-rw-rw-rw- 1 root root 0 Jul  3 17:27 ctl
-rw-rw-rw- 1 root root 0 Jul  3 17:27 fid
-rw-rw-rw- 1 root root 0 Jul  3 17:27 ktrace
-rw-rw-rw- 1 root root 0 Jul  3 17:27 stats

Looking at documentation on how to configure it I had the surprise to see that it’s already there:

[root@server1 ~]# /etc/init.d/vxodm status
vxodm is running...
[orapris@server1 ~]$ ll $ORACLE_HOME/lib/libodm*
-rw-r--r-- 1 orapris dba  7442 Aug 14  2009 /ora_prisma/software/lib/libodm11.a
lrwxrwxrwx 1 orapris dba    12 Nov 12  2011 /ora_prisma/software/lib/libodm11.so -> libodmd11.so
-rw-r--r-- 1 orapris dba 12331 Aug 14  2009 /ora_prisma/software/lib/libodmd11.so

So then what’s the difference with the library from Veritas package and this one ? I have feeling that the Oracle one is fake library for link consistency and in any case you must use the one coming from Veritas package.

Once database restarted you should see this appearing in alert log file located in ADR (Automatic Diagnostic Repository):

Oracle instance running with ODM: Veritas 5.1.100.00 ODM Library, Version 2.0

Once activated (/dev/omd/fid file not empty, File Identification Descriptor) you can find usage statistics in:

[root@server1 data]# cat /dev/odm/stats
     abort:                      0
    cancel:                      0
    commit:                      0
    create:                      0
    delete:                      0
  identify:                      0
        io:                      0
reidentify:                      0
    resize:                      0
unidentify:                      0
     mname:                      0
     vxctl:                      0
    vxvers:                      0
    mname2:                      0
  protvers:                      0
   sethint:                      0
   gethint:                    660
 resethint:                      0
    io req:                      0
  io calls:                      0
  comp req:                      0
comp calls:                      0
io mor cmp:                      0
io zro cmp:                      0
io nop cmp:                      0
cl receive:                      0
  cl ident:                      0
cl reserve:                      0
 cl delete:                      0
 cl resize:                      0
   cl join:                      0
cl same op:                      0
cl opt idn:                      0
cl opt rsv:                      0

And using odmstat:

[root@server1 ~]# odmstat -i 10 -c 5 /ora_prisma/log/prisma/redo01.log
                   OPERATIONS          FILE BLOCKS    AVG TIME(ms)
 
FILE NAME                    NREADS   NWRITES     RBLOCKS     WBLOCKS   RTIME  WTIME
 
 
Wed 04 Jul 2012 06:06:49 PM CEST
/ora_prisma/log/prisma/redo01.log       601      4842    614401    106126    0.0  111.2
 
Wed 04 Jul 2012 06:06:59 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0        30    0.0   53.3
 
Wed 04 Jul 2012 06:07:09 PM CEST
/ora_prisma/log/prisma/redo01.log         0         5         0        11    0.0   22.0
 
Wed 04 Jul 2012 06:07:19 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0       121    0.0   23.3
 
Wed 04 Jul 2012 06:07:29 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0        65    0.0   88.3

Once ODM is activated you do not have to bother anymore with mount options and filesystem properties as ODM performs direct IO (raw) and work inkernalized asynchronous IO (kaio) mode.

Remark:
It is strongly suggested to backup your database files if deactivating ODM.

Veritas Cached Oracle Disk Manager

As we have seen ODM bypasses file system cache and so do direct I/O and no read ahead (as we know read intensive database can suffer from this). Cached ODM (CODM) is implementing selected cached I/O, what ? Better than a long explanation from Symantec documentation:

ODM I/O bypasses the file system cache and directly reads from and writes to disk. Cached ODM enables selected I/O to use caching (file system buffering) and read ahead, which can improve overall Oracle DB I/O performance. Cached ODM performs a conditional form of caching that is based on per-I/O hints from Oracle. The hints indicate what Oracle will do with the data. ODM uses these hints to perform caching and read ahead for some reads, but ODM avoids caching other reads, possibly even for the same file.

CODM is an ODM extension (that must be installed as a requisite), check CODM package is installed with:

[root@server1 ~]# rpm -qa | grep VRTSdbed
VRTSdbed-5.1.100.000-SP1_RHEL5

Activate it on a file system using (/etc/vx/tunefstab to make it persistent across reboot):

[root@server1 ~]# vxtunefs -o odm_cache_enable=1 /ora_prisma/log

Then use setcachefile and getcachefile odmadm parameters to change individual files:

[root@server1 ~]# odmadm getcachefile /ora_prisma/data/prisma/mndata01.dbf
/ora_prisma/data/prisma/mndata01.dbf,DEF

The cachemap maps file type and I/O type combinations to caching advisories. you can tune it using setcachemap and getcachemap odmadm parameters. List of available parmaeters:

[root@server1 ~]# odmadm  getcachemap
ctl/redolog_write             none
ctl/redolog_read              none
ctl/archlog_read              none
ctl/media_recovery_write      none
ctl/mirror_read               none
ctl/resilvering_write         none
ctl/ctl_file_read             none
ctl/ctl_file_write            none
ctl/flash_read                none
ctl/flash_write               none
.
.

On top of complexity to understand which files can benefit from caching or not, cachemap has so much values to tune that it becomes impossible to tune CODM manually without any advices. Please note that cachemap settings are not persistent across reboot, use /etc/vx/odmadm file to achieve it. So how to achieve this ?

It is advised not to change default cachemap to avoid drawback like file system cache and Oracle SGA double cache. To understand which files can benefit from CODM you have two options:

Use a Veritas tool called Cached ODM Manager (dbed_codm_adm) that can be used by DBAs.
Generate AWR reports (Oracle 10g and above) and order tablespaces/files per Reads, highest physical reads datafiles would benefit from CODM.

Putting all together it starts to be a bit complex:

Oracle SGA
File system cache
CODM (ODM)

So then where to put available memory ? Added value of CODM is dynamic allocation, SGA is not dynamic (SGA_MAX_SIZE / MEMORY_MAX_TARGET). Then CODM versus file system ? CODM has a much better granularity as a per file cache, so you can activate it for file where it’s really needed (using AWR and/or dbed_codm_adm).

got from http://blog.yannickjaquier.com/

References

Storage Foundation DocCentral
Pros and Cons of Using Direct I/O for Databases [ID 1005087.1]
Async IO does not appear to be in use by Oracle on VxFS [ID 756275.1]
Master Note for Oracle Disk Manager [ID 1226653.1]
Cached Oracle Disk Manager: Usage Guidelines and Best Practices
Controlfile Structure
Oracle Internals – 2 (Oracle Controlfile Structures )
Log Block Size
Oracle redo logs use a different blocksize
Understanding 4KB Sector Support for Oracle Files
Analyzing the impact of the Vxfs filesystem block size on Oracle

Yet another Solaris user

Search This Blog