Skip to main content

Veritas File System (VxFS) tuning

Veritas File System (VxFS) tuning

Preamble

In addition of Veritas Volume Manager (VxVM) Symantec is also proposing Veritas File System (VxFS) that is most of the time used in combination of VxVM. Symantec claim that highest benefit is found when using both in parallel. This document has been written using Red Hat Enterprise Linux Server release 5.5 (Tikanga) and below VxFS/VxVM releases:
[root@server1 ~]# rpm -qa | grep -i vrts
VRTSvlic-3.02.51.010-0
VRTSfssdk-5.1.100.000-SP1_GA_RHEL5
VRTSob-3.4.289-0
VRTSsfmh-3.1.429.0-0
VRTSspt-5.5.000.005-GA
VRTSlvmconv-5.1.100.000-SP1_RHEL5
VRTSodm-5.1.100.000-SP1_GA_RHEL5
VRTSvxvm-5.1.101.100-SP1RP1P1_RHEL5
VRTSvxfs-5.1.100.000-SP1_GA_RHEL5
VRTSatClient-5.0.32.0-0
VRTSaslapm-5.1.100.000-SP1_RHEL5
VRTSatServer-5.0.32.0-0
VRTSperl-5.10.0.7-RHEL5.3
VRTSdbed-5.1.100.000-SP1_RHEL5
To avoid putting real server name in your document think of something like:
export PS1="[\u@server1 \W]# "

VxFS file system physical parameters

When creating a file system there are two important characteristics to choose:
  • Block size (cannot be changed once the file system has been created)
  • Intent log size (can be changed after file system creation with fsadm, VxFS usually performs better with larger log sizes)
Remark:
When using VxVM with VxFS Symantec recommend usage of vxresize (instead of vxassist and fsadm) to club volume and filesystem shrink or grow.

Block size

From man mkfs_vxfs command (-o bsize=bsize):
File system size Default block size
————— ——————
0 TB to 1 TB 1k
>1 TB 8k
Similarly, the block size determines the maximum possible file system size, as given on the following table:
Block size Maximum file system size
———- ————————
1k 32 TB
2k 64 TB
4k 128 TB
8k 256 TB
Recommended Oracle file systems block size (assuming your Oracle database have block size equal or bigger than 8KB which is :
File SystemBlock Size
Oracle software and dump/diagnostic directories1KB
Redo log directory512 bytes for Solaris, AIX, Windows, Linux and 1KB for HP-UX
Archived log directory1KB
Control files directory8KB (control files block size is 16KB starting with Oracle 10g)
Data, index, undo, system/sysaux and temporary directories8KB
You can check control file block size with (Linux RedHat 5.5 and Oracle 11.2.0.3):
SQL> select cfbsz from x$kcccf;

     CFBSZ
----------
     16384
     16384
Remark:
For Oracle release lower than 10g control files block size was equal to Oracle initialization parameter db_block_size, starting with 10g their block size is now 16KB whatever value of db_block_size.
You can check redo log block size with (Linux RedHat 5.5 and Oracle 11.2.0.3):
SQL> select lebsz from x$kccle;

     LEBSZ
----------
       512
       512
       512
Remark:
As 4KB block size disk are slowly coming on the market, Oracle 11gR2 now offer the capability to create redo log files with the size you like…
Once the fylesystem has already been created the command to see what block size has been chosen is fstyp. On HPUX 11iv3 (11.31) the parameter to look at is f_frsize:
server1{root}# fstyp -v /dev/vg1d1c/lvol3
vxfs
version: 6
f_bsize: 8192
f_frsize: 1024
f_blocks: 8388608
f_bfree: 2126170
f_bavail: 1993285
f_files: 576852
f_ffree: 531540
f_favail: 531540
f_fsid: 1073872899
f_basetype: vxfs
f_namemax: 254
f_magic: a501fcf5
f_featurebits: 0
f_flag: 0
f_fsindex: 9
f_size: 8388608
On Linux (RedHat 6.3) the parameter to look at is bsize:
[root@server2 ~]# fstyp -v /dev/vx/dsk/vgp3316/lvol10
vxfs
magic a501fcf5  version 9  ctime Wed 05 Feb 2014 12:18:11 PM CET
logstart 0  logend 0
bsize  1024 size  205520896 dsize  205520896  ninode 0  nau 0
defiextsize 0  ilbsize 0  immedlen 96  ndaddr 10
aufirst 0  emap 0  imap 0  iextop 0  istart 0
bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0
nindir 2048  aulen 32768  auimlen 0  auemlen 8
auilen 0  aupad 0  aublocks 32768  maxtier 15
inopb 4  inopau 0  ndiripau 0  iaddrlen 8   bshift 10
inoshift 2  bmask fffffc00  boffmask 3ff  checksum f7f795e4
oltext1 32  oltext2 1282  oltsize 1  checksum2 0
free 192386052  ifree 0
efree  2 1 2 3 4 3 3 1 4 5 3 5 5 4 2 14 19 10 6 2 6 6 2 1 1 0 2 0 0 0 0 0
Remark:
If from fragmentation point of view having bigger filesystem block size is obvious from performance perspective it is no so straight forward. See Analyzing the impact of the Vxfs filesystem block size on Oracle article in reference section that is almost breaking what I had in mind since multiple years…

Intent log size

From man mkfs_vxfs command (-o logsize=n):
Block size Minimum log size Maximum log size
———- —————- —————-
1k 256 blocks 262,144 blocks
2k 128 blocks 131,072 blocks
4k 64 blocks 65,536 blocks
8k 32 blocks 32,768 blocks
The default log size increases with the file system size, as shown on the following table:
File system size Default log size
—————- —————-
0 MB to 8 MB 256k
8 MB to 512 MB 1 MB
512 MB to 16 GB 16 MB
16 GB to 512 GB 64 MB
512+ GB 256 MB
Intent log size:
[root@server1 ~]# /opt/VRTS/bin/fsadm -t vxfs -L /ora_prisma/rbs
UX:vxfs fsadm: INFO: V-3-25669:  logsize=16384 blocks, logvol=""
Remark:
if the fsadm command complains for something like:
fsadm: Wrong argument "-t". (see: fsadm --help)
then look for VxFS binaries in /opt/VRTS/bin directory. Be careful if changing the PATH because simple tool like df will not behave the same as Symantec has re-written it.
So in my 6GB file system example example:
[root@server1 ~]# df -P /ora_prisma/rbs
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp1417/lvol9   6291456   5264042    963208      85% /ora_prisma/rbs
The default block size (1KB) has been chosen and so default intent log size of 16384 blocks i.e. 16MB.
So which intent log size to choose ? Symantec say recovery time increae with larger intent log while VxFS performs better with larger intent log size. As you obliviously when to tune for the 99.99% of time when your system is up and running you should consider creating large intent log size keeping in mind that behavior must be controlled while application is running (no clear Oracle recommendation)…

File extents

Same as Oracle table extent you can change default extent allocation policy and/or preallocate space to a file:
[root@server1 prisma]# getext undotbs01.dbf
undotbs01.dbf:  Bsize  1024  Reserve       0  Extent Size       0
Remark:
An extent size of 0 use default extent allocation. See vxtunefs for policy description (parameters are initial_extent_size and max_seqio_extent_size).
Small exemple with an empty file:
[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  28377085  52039685      36% /ora_iedbre/data
[root@server1 data]# touch yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve       0  Extent Size       0
[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  28377085  52039684      36% /ora_iedbre/data
Now changing its extend and initial allocation:
[root@server1 data]# setext -t vxfs -r 30g -f chgsize yannick
[root@server1 data]# df -P .
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/vx/dsk/vgp4118/lvol4  83886080  59834365  22548484      73% /ora_iedbre/data
[root@server1 data]# ll yannick
-rw-r----- 1 root root 32212254720 Jun 22 14:45 yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve 31457280  Extent Size       0
[root@server1 data]# setext -t vxfs -e 1g -r 30g  yannick
[root@server1 data]# ll yannick
-rw-r----- 1 root root 32212254720 Jul  4 12:55 yannick
[root@server1 data]# getext yannick
yannick:        Bsize  1024  Reserve 31457280  Extent Size 1048576
Please note it takes a bit of time to recover free space when deleting this test file.
Fixed extent sizes and Oracle ? I would say it is beneficial for Oracle datafiles as it avoids fragmentation, but if like me you work with autoextend feature then do not set a too small next extent and you would achieve same behavior.

VxFS file system tuning

Tunable filesystem parameters

[root@server1 ~]# vxtunefs -p /ora_prisma/rbs
Filesystem i/o parameters for /ora_prisma/rbs
read_pref_io = 65536
read_nstream = 1
read_unit_io = 65536
write_pref_io = 65536
write_nstream = 1
write_unit_io = 65536
pref_strength = 10
buf_breakup_size = 1048576
discovered_direct_iosz = 262144
max_direct_iosz = 1048576
default_indir_size = 8192
odm_cache_enable = 0
write_throttle = 0
max_diskq = 1048576
initial_extent_size = 8
max_seqio_extent_size = 2048
max_buf_data_size = 8192
hsm_write_prealloc = 0
read_ahead = 1
inode_aging_size = 0
inode_aging_count = 0
fcl_maxalloc = 195225600
fcl_keeptime = 0
fcl_winterval = 3600
fcl_ointerval = 600
oltp_load = 0
delicache_enable = 1
thin_friendly_alloc = 0
If file systems are used with VxVM it is suggested to let default value so do test when changing…
Remark:
When using VxFS with VxVM, VxVM by default breaks up I/O requests larger than 256K.

File system fragmentation

To display it issue:
[root@server1 ~]# /opt/VRTS/bin/fsadm -t vxfs -D /ora_prisma/data
 
  Directory Fragmentation Report
             Dirs        Total      Immed    Immeds   Dirs to   Blocks to
             Searched    Blocks     Dirs     to Add   Reduce    Reduce
  total             3         1         2         0         0           0
Remark:
Symantec do recommend to perform regular file system defragmentation (!!):
In general, VxFS works best if the percentage of free space in the file system does not get below 10 percent. This is because file systems with 10 percent or more free space have less fragmentation and better extent allocation. Regular use of Veritas df command (not the default OS df) to monitor free space is desirable (man df_vxfs).
[root@server1 ~]# /opt/VRTS/bin/df -o s /ora_prisma/data/
/ora_prisma/data   (/dev/vx/dsk/vgp1417/lvol8):  14065752 blocks  1875431 files
Free Extents by Size
          1:          2            2:          2            4:          1
          8:          1           16:          1           32:          0
         64:          0          128:          1          256:          1
        512:          1         1024:          1         2048:          0
       4096:          1         8192:          1        16384:          1
      32768:          0        65536:          0       131072:          1
     262144:          0       524288:          0      1048576:          1
    2097152:          1      4194304:          1      8388608:          0
   16777216:          0     33554432:          0     67108864:          0
  134217728:          0    268435456:          0    536870912:          0
 1073741824:          0   2147483648:          0
An unfragmented file system has the following characteristics:
  • Less than 1 percent of free space in extents of less than 8 blocks in length
  • Less than 5 percent of free space in extents of less than 64 blocks in length
  • More than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks
A badly fragmented file system has one or more of the following characteristics:
  • Greater than 5 percent of free space in extents of less than 8 blocks in length
  • More than 50 percent of free space in extents of less than 64 blocks in length
  • Less than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks 

Mount options

Suggested mount options for Oracle databases:
File SystemNormal Mount Options (VxFS)Advanced Mount Options (VxFS)
Oracle software and dump/diagnostic directoriesdelaylog,datainlog,nolargefilesdelaylog,nodatainlog,nolargefiles
Redo log directorydelaylog,datainlog,largefilesdelaylog,nodatainlog,convosync=direct,mincache=direct,largefiles
Archived log directorydelaylog,datainlog,nolargefilesdelaylog,nodatainlog,convosync=direct,mincache=direct,nolargefiles
Control files directorydelaylog,datainlog,nolargefilesdelaylog,datainlog,nolargefiles
Data, index, undo, system/sysaux and temporary directoriesdelaylog,datainlog,largefilesdelaylog,nodatainlog,convosync=direct,mincache=direct,largefiles
Remark:
Licensed product Concurrent I/O (CIO) should also be considered when looking for I/O performance and running an Oracle database.

Database accelerators

Veritas Extension for Oracle Disk Manager

I found this in Veritas File System Administrator’s Guide and more deeply in Veritas Storage Foundation: Storage and Availability Management for Oracle Databases and it’s like re-discovering hot water. What is this Oracle Disk Manager (ODM) ?
From Symantec documentation:
The benefits of using Oracle Disk Manager are as follows:
  • True kernel asynchronous I/O for files and raw devices
  • Reduced system call overhead
  • Improved file system layout by preallocating contiguous files on a VxFS file system
  • Performance on file system files that is equivalent to raw devices
  • Transparent to users
Oracle Disk Manager improves database I/O performance to VxFS file systems by:
  • Supporting kernel asynchronous I/O
  • Supporting direct I/O and avoiding double buffering
  • Avoiding kernel write locks on database files
  • Supporting many concurrent I/Os in one system call
  • Avoiding duplicate opening of files per Oracle instance
  • Allocating contiguous datafiles
From Oracle documentation:
Oracle has developed a new disk and file management API called odmlib, which is marketed under the feature Oracle Disk Manager (ODM). ODM is fundamentally a file management and I/O interface that allows DBAs to manage larger and more complex databases, whilst maintaining the total cost of ownership.
Oracle Disk Manager (ODM) is packaged as part of Oracle9i and above; however, you’ll need a third party vendor’s ODM driver to fully implement Oracle’s interface. For example, Veritas’ VRTSodm package (in Database Edition V3.5) provides an ODM library. Other vendors such as HP and Network Appliance (DAFS) have also announced support and integration of ODM.
A bit of history can be found in this Veritas slide:
vxfs1
vxfs1
Remark:
ODM is an integrated solution and is considered as replacement of Quick I/O.
Let’s confirm option is available and usable:
[root@server1 ~]# rpm -qa | grep VRTSodm
VRTSodm-5.1.100.000-SP1_GA_RHEL5
[root@server1 ~]# /sbin/vxlictest -n "VERITAS Database Edition for Oracle" -f "ODM"
ODM feature is licensed
[root@server1 ~]# /opt/VRTS/bin/vxlicrep | grep ODM
   ODM                                 = Enabled
[root@server1 ~]# lsmod | grep odm
vxodm                 164224  1
fdd                    83552  2 vxodm
[root@server1 ~]# ll /dev/odm
total 0
-rw-rw-rw- 1 root root 0 Jul  3 17:27 cluster
-rw-rw-rw- 1 root root 0 Jul  3 17:27 ctl
-rw-rw-rw- 1 root root 0 Jul  3 17:27 fid
-rw-rw-rw- 1 root root 0 Jul  3 17:27 ktrace
-rw-rw-rw- 1 root root 0 Jul  3 17:27 stats
Looking at documentation on how to configure it I had the surprise to see that it’s already there:
[root@server1 ~]# /etc/init.d/vxodm status
vxodm is running...
[orapris@server1 ~]$ ll $ORACLE_HOME/lib/libodm*
-rw-r--r-- 1 orapris dba  7442 Aug 14  2009 /ora_prisma/software/lib/libodm11.a
lrwxrwxrwx 1 orapris dba    12 Nov 12  2011 /ora_prisma/software/lib/libodm11.so -> libodmd11.so
-rw-r--r-- 1 orapris dba 12331 Aug 14  2009 /ora_prisma/software/lib/libodmd11.so
So then what’s the difference with the library from Veritas package and this one ? I have feeling that the Oracle one is fake library for link consistency and in any case you must use the one coming from Veritas package.
Once database restarted you should see this appearing in alert log file located in ADR (Automatic Diagnostic Repository):
Oracle instance running with ODM: Veritas 5.1.100.00 ODM Library, Version 2.0
Once activated (/dev/omd/fid file not empty, File Identification Descriptor) you can find usage statistics in:
[root@server1 data]# cat /dev/odm/stats
     abort:                      0
    cancel:                      0
    commit:                      0
    create:                      0
    delete:                      0
  identify:                      0
        io:                      0
reidentify:                      0
    resize:                      0
unidentify:                      0
     mname:                      0
     vxctl:                      0
    vxvers:                      0
    mname2:                      0
  protvers:                      0
   sethint:                      0
   gethint:                    660
 resethint:                      0
    io req:                      0
  io calls:                      0
  comp req:                      0
comp calls:                      0
io mor cmp:                      0
io zro cmp:                      0
io nop cmp:                      0
cl receive:                      0
  cl ident:                      0
cl reserve:                      0
 cl delete:                      0
 cl resize:                      0
   cl join:                      0
cl same op:                      0
cl opt idn:                      0
cl opt rsv:                      0
And using odmstat:
[root@server1 ~]# odmstat -i 10 -c 5 /ora_prisma/log/prisma/redo01.log
                   OPERATIONS          FILE BLOCKS    AVG TIME(ms)
 
FILE NAME                    NREADS   NWRITES     RBLOCKS     WBLOCKS   RTIME  WTIME
 
 
Wed 04 Jul 2012 06:06:49 PM CEST
/ora_prisma/log/prisma/redo01.log       601      4842    614401    106126    0.0  111.2
 
Wed 04 Jul 2012 06:06:59 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0        30    0.0   53.3
 
Wed 04 Jul 2012 06:07:09 PM CEST
/ora_prisma/log/prisma/redo01.log         0         5         0        11    0.0   22.0
 
Wed 04 Jul 2012 06:07:19 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0       121    0.0   23.3
 
Wed 04 Jul 2012 06:07:29 PM CEST
/ora_prisma/log/prisma/redo01.log         0         6         0        65    0.0   88.3
Once ODM is activated you do not have to bother anymore with mount options and filesystem properties as ODM performs direct IO (raw) and work inkernalized asynchronous IO (kaio) mode.
Remark:
It is strongly suggested to backup your database files if deactivating ODM.

Veritas Cached Oracle Disk Manager

As we have seen ODM bypasses file system cache and so do direct I/O and no read ahead (as we know read intensive database can suffer from this). Cached ODM (CODM) is implementing selected cached I/O, what ? Better than a long explanation from Symantec documentation:
ODM I/O bypasses the file system cache and directly reads from and writes to disk. Cached ODM enables selected I/O to use caching (file system buffering) and read ahead, which can improve overall Oracle DB I/O performance. Cached ODM performs a conditional form of caching that is based on per-I/O hints from Oracle. The hints indicate what Oracle will do with the data. ODM uses these hints to perform caching and read ahead for some reads, but ODM avoids caching other reads, possibly even for the same file.
CODM is an ODM extension (that must be installed as a requisite), check CODM package is installed with:
[root@server1 ~]# rpm -qa | grep VRTSdbed
VRTSdbed-5.1.100.000-SP1_RHEL5
Activate it on a file system using (/etc/vx/tunefstab to make it persistent across reboot):
[root@server1 ~]# vxtunefs -o odm_cache_enable=1 /ora_prisma/log
Then use setcachefile and getcachefile odmadm parameters to change individual files:
[root@server1 ~]# odmadm getcachefile /ora_prisma/data/prisma/mndata01.dbf
/ora_prisma/data/prisma/mndata01.dbf,DEF
The cachemap maps file type and I/O type combinations to caching advisories. you can tune it using setcachemap and getcachemap odmadm parameters. List of available parmaeters:
[root@server1 ~]# odmadm  getcachemap
ctl/redolog_write             none
ctl/redolog_read              none
ctl/archlog_read              none
ctl/media_recovery_write      none
ctl/mirror_read               none
ctl/resilvering_write         none
ctl/ctl_file_read             none
ctl/ctl_file_write            none
ctl/flash_read                none
ctl/flash_write               none
.
.
On top of complexity to understand which files can benefit from caching or not, cachemap has so much values to tune that it becomes impossible to tune CODM manually without any advices. Please note that cachemap settings are not persistent across reboot, use /etc/vx/odmadm file to achieve it. So how to achieve this ?
It is advised not to change default cachemap to avoid drawback like file system cache and Oracle SGA double cache. To understand which files can benefit from CODM you have two options:
  • Use a Veritas tool called Cached ODM Manager (dbed_codm_adm) that can be used by DBAs.
  • Generate AWR reports (Oracle 10g and above) and order tablespaces/files per Reads, highest physical reads datafiles would benefit from CODM.
Putting all together it starts to be a bit complex:
  • Oracle SGA
  • File system cache
  • CODM (ODM)
So then where to put available memory ? Added value of CODM is dynamic allocation, SGA is not dynamic (SGA_MAX_SIZE / MEMORY_MAX_TARGET). Then CODM versus file system ? CODM has a much better granularity as a per file cache, so you can activate it for file where it’s really needed (using AWR and/or dbed_codm_adm).

got from http://blog.yannickjaquier.com/

References

Comments

Popular posts from this blog

Solaris. remove unusable scsi lun

Solaris remove unusable or failing scsi lun 1. The removed devices show up as drive not available in the output of the format command: # format Searching for disks...done ................      255. c1t50000974082CCD5Cd249 <drive not available>           /pci@3,700000/SUNW,qlc@0/fp@0,0/ssd@w50000974082ccd5c,f9 ................      529. c3t50000974082CCD58d249 <drive not available>           /pci@7,700000/SUNW,qlc@0/fp@0,0/ssd@w50000974082ccd58,f9 2. After the LUNs are unmapped Solaris displays the devices as either unusable or failing. # cfgadm -al -o show_SCSI_LUN | grep -i unusable # # cfgadm -al -o show_SCSI_LUN | grep -i failing c1::50000974082ccd5c,249       disk         connected    configured   failing c3::50000974082ccd58,249 ...

memory error detect XSCF uboot

If you see something like this when you poweron you server: memory error detect 80000008, address 000002d0 data 55555555 -> fbefaaaa capture_data hi fbefaaaa lo deadbeef ecc 1b1b capture_attributes 01113001 address 000002d0 memory error detect 80000008, address 000002d4 data aaaaaaaa -> deadbeef capture_data hi fbefaaaa lo deadbeef ecc 1b1b capture_attributes 01113001 address 000002d4 memXSCF uboot  01070000  (Feb  8 2008 - 11:12:19) XSCF uboot  01070000  (Feb  8 2008 - 11:12:19) SCF board boot factor = 7180     DDR Real size: 256 MB     DDR: 224 MB Than your XSCF card is broked. Replace it with new one. After that it will ask you for enter chassis number - located at front of the server XSCF promt to enter your chasses number ( is a S/N of your server ): Please input the chassis serial number : XXXXXXX 1:PANEL Please select the number : 1 Restoring data from PANEL to XSCF#0. Please wait for se...

SPARC OBP cheatsheet

Boot PROM Basics Boot PROM(programmable read only memory): It is a firmware (also known as the monitor program) provides: 1. basic hardware testing & initialization before booting. 2. contains a user interface that provide access to many important functions. 3. enables the system to boot from wide range of devices. It controls the system operation before the kernel becomes available. It provides a user interface and firmware utility commands known as FORTH command set. These commands include the boot commands, the diagnostic commands & the commands for modifying the default configuration. Command to determine the version of the Open Boot PROM on the system: # /usr/platform/'uname -m'/sbin/prtdiag -v (output omitted) System PROM revisions: ---------------------- OBP 4.16.4 2004/12/18 05:21 Sun Blade 1500 (Silver) OBDIAG 4.16.4.2004/12/18 05:21 # prtconf -v OBP 4.16.4 2004/12/18 05:21 Open Boot Architectures Standards: It is based on IEEE standard #1275, accord...