Senthil Kumar's Blog: Oracle 10gR2 RAC

Showing posts with label Oracle 10gR2 RAC. Show all posts

Wednesday, 29 January 2014

OPatch failed with error code 73;UtilSession failed:oracle.crs:

While applying Oracle patch on RDBMS_HOME we received following error

[oracle@ppcmswgc-db01 ~]$ OPatch/opatch napply custom/server/ -local -oh $ORACLE_HOME -id 13440962
Invoking OPatch 11.2.0.1.7

Oracle Interim Patch Installer version 11.2.0.1.7
Copyright (c) 2011, Oracle Corporation. All rights reserved.

UTIL session

Oracle Home : /u02/app/oracle/product/11.2.0/db_1
Central Inventory : /u01/oraInventory
from : /etc/oraInst.loc
OPatch version : 11.2.0.1.7
OUI version : 11.2.0.3.0
Log file location : /u02/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2014-01-29_10-42-40AM.log

Verifying environment and performing prerequisite checks...
UtilSession failed: Patch 13440962 requires component(s) that are not installed in OracleHome. These not-installed components are oracle.crs:11.2.0.3.0,
Log file location: /u02/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2014-01-29_10-42-40AM.log

OPatch failed with error code 73

SOLUTION

Change the directory to "custom/server" and run opatch from there:

[oracle@ppcmswgc-db01 ~]$ ls -ltr
total 160536
drwxrwxrwx 5 oracle oinstall 4096 Dec 28 2011 13440962

[oracle@ppcmswgc-db01 ~]$ cd 13440962/custom/server
[oracle@ppcmswgc-db01 server]$ ls -tlr
total 4
drwxrwxrwx 5 oracle oinstall 4096 Dec 28 2011 13440962

[oracle@ppcmswgc-db01 ~]opatch napply -local -oh $ORACLE_HOME -id 13440962

Tuesday, 26 June 2012

Oracle 10g RAC Fast Application Notification - Server side callouts

On the database tier, you can implement FAN server-side callouts.

FAN events are generated in response to a state change, which can include the following:
• Starting or stopping the database
• Starting or stopping an instance
• Starting or stopping a service
• A node leaving the cluster

You can also implement server-side callouts, which can be configured to perform actions such as logging fault tickets or notifying the database administrator.

When a state change occurs, the RAC HA framework posts a FAN event to ONS immediately. When a node receives an event through ONS, it will asynchronously execute all executables in the
server-side callouts directory, which is $ORA_CRS_HOME/racg/usrco. Server-side callouts must be stored in this directory; otherwise, they will not be executed.

Create the following callout sample script as Oracle user in $ORA_CRS_HOME/racg/usrco on
Node 1(coltdb01)

[oracle@coltdb01 usrco]$ cat callout.sh
#! /bin/ksh
FAN_LOGFILE=/home/oracle/log/rac_`hostname`.log
echo $* "reported="`date` >> $FAN_LOGFILE &

All the events are recorded and appended to the log file - home/oracle/log/rac_<hostname>.log

You must set the execute permission for the callout

[oracle@coltdb01 usrco]$ chmod 744 /u01/crs/oracle/product/10.2.0/crs/racg/usrco/callout.sh

In addition, you must remember to copy the script to each node in the cluster. The script will automatically be executed whenever the HA framework generates a RAC event.

Example - 1

Node 1 (coltdb01) - shutdown the node-1(coltdb01) instance

[oracle@coltdb01 ~]$ srvctl stop instance -d sendb -i sendb1

Now check the logfile mentioned in callout script on node 1

[oracle@coltdb01 log]$ pwd
/home/oracle/log
[oracle@coltdb01 log]$ ls -ltr
total 4
-rw-rw-r-- 1 oracle oinstall 182 Jun 26 17:20 rac_coltdb01.log
[oracle@coltdb01 log]$ more rac_coltdb01.log
INSTANCE VERSION=1.0 service=sendb.cms.colt database=sendb instance=sendb1 host=coltdb01 status=down reason=user timestamp=26-Jun-2012 17:20:59 reported=Tue Jun 26 17:20:59 IST 2012

One node-2 & node-3 there are no logfile created as we shutdown only node-1 instance

Example - 2

Node -1 (coltdb1) - shutdown the node-2(coltdb02) instance

Check the logfile on node 1 - Nothing is appended as we shutdown node-2 instance

Check the logfile on node 2
[oracle@coltdb02 log]$ more rac_coltdb02.log
INSTANCE VERSION=1.0 service=sendb.cms.colt database=sendb instance=sendb2 host=coltdb02 status=down reason=user timestamp=26-Jun-2012 17:26:48 reported=Tue Jun 26 17:26:48 IST 2012

Node-3 there are no logfile created as we shutdown only node-2 instance

Example - 3

Node 1 (coltdb01) - Start the database

Logfiles in all the nodes are updated with the Event

Node -1
[oracle@coltdb01 log]$ more rac_coltdb01.log
INSTANCE VERSION=1.0 service=sendb.cms.colt database=sendb instance=sendb1 host=coltdb01 status=up reason=user timestamp=26-Jun-2012 17:32:44 reported=Tue Jun 26 17:32:44 IST 2012

Node - 2
[oracle@coltdb02 log]$ more rac_coltdb02.log
INSTANCE VERSION=1.0 service=sendb.cms.colt database=sendb instance=sendb2 host=coltdb02 status=up reason=user timestamp=26-Jun-2012 17:33:11 reported=Tue Jun 26 17:33:11 IST 2012

Node-3
[oracle@coltdb03 log]$ more rac_coltdb03.log
INSTANCE VERSION=1.0 service=sendb.cms.colt database=sendb instance=sendb3 host=coltdb03 status=up reason=user timestamp=26-Jun-2012 17:33:07 reported=Tue Jun 26 17:33:07 IST 2012

Friday, 15 June 2012

Adding a new Node to Oracle RAC cluster

Environment
------------------
DB Version - 10.2.0.5
OS Version - RHEL 5.7 64bit
Existing Nodes - coltdb01, coltdb02
New Node - coltdb03

Pre-requisites
----------------
1. Install the Same OS version as other 2 nodes
2. Copy the /etc/sysctl.conf from the other node to the new node for the Kernel Parameter
3. Create the same user & group as other nodes ( Oracle user & dba group)
4. Copy the .bash_profile from the Oracle home of other node to the new node and edit the ORACLE_SID
5. Configure the ssh from the other nodes to the new node.
6. If you are not using DNS, add the IP addresses of all other nodes in the cluster to /etc/hosts,
including the public and private network addresses and the VIP address

Adding Storage and ASM
----------------------------

1. Identify the Worldwide Name (WWN) for the HBA on the server. On the Storage Area Network (SAN), associate each LUN used by the database with the WWN of the server.
- To identify the WWPN use the following command -
cat /sys/class/scsi_host/hostn/device/fc_host:hostn/port_name . Replace n with your setting

2. Once the Storage Admin assigns the LUN to the new node, execute fdisk -l to check it.

3. If you are using ASM, configuring Oracle ASM using ASMLib
- Install following RPMS in the same order and configure ASM -
oracleasm-support-2.1.7-1.el5.x86_64.rpm
oracleasm-2.6.18-274.el5-2.0.5-1.el5.x86_64.rpm
oracleasmlib-2.0.4-1.el5.x86_64.rpm
4. For the new node to access existing ASM disk groups, you must issue the following command:
- /etc/init.d/oracleasm scandisks

5. To list the ASM disks
- oracleasm listdisks

Adding OCR and Voting Disk in New Node
-------------------------------------------------------
1. Copy the /etc/sysconfig/rawdevices from existing node to the new node
[root@coltdb03 raw]# cat /etc/sysconfig/rawdevices
# raw device bindings
# format: <rawdev> <major> <minor>
# <rawdev> <blockdev>
# example: /dev/raw/raw1 /dev/sda1
# /dev/raw/raw2 8 5
#OCR
/dev/raw/raw1 /dev/sdb1
/dev/raw/raw2 /dev/sdb2
# Voting
/dev/raw/raw3 /dev/sdb5
/dev/raw/raw4 /dev/sdb6
/dev/raw/raw5 /dev/sdb7

2. Provide the necessary Privileges
OCR==>
chown root:oinstall /dev/raw/raw[12]
chmod 640 /dev/raw/raw[12]

Voting disk==>
chown oracle:oinstall /dev/raw/raw[345]
chmod 640 /dev/raw/raw[345]

* Add to /etc/rc.d/rc.local to get it enabled on boot time

3. Restart the raw devices service
[root@coltdb03 ~]# service rawdevices restart

Install Oracle Clusterware using x-windows
-------------------------------------------------
1. Install the Clusterware from the existing node "coltdb01" as Oracle user
- [oracle@coltdb01 bin]$ cd $CRS_HOME/oui/bin
[oracle@coltdb01 bin]$ ./addNode.sh

2. Add the Node-Specific Interface Configuration (cd $CRS_HOME/bin)
- Obtain the Oracle Notification Service(ONS) remote port number, which is speficied in the file $CRS_HOME/opmn/conf/ons.config
- [oracle@coltdb01 ~]$ cd $CRS_HOME/bin
[oracle@coltdb01 bin]$ racgons add_config coltdb03:6200

3. Before Proceeding with the installation of the Oracle Database Software,Verify Oracle Clusterware Installation Using CLUVFY
- cluvfy comp clumgr -n all
- cluvfy comp clu
- cluvfy stage -post crsinst -n all

4. Install Oracle Database Software ( cd $ORACLE_HOME/oui/bin)
- ./addNode.sh

5. if you have separate ASM Home, install the ASM Software, ( cd $ASM_HOME/oui/bin)
- ./addNode.sh

6. Adding the Instance by executing dbca from one of the existing node (coltdb01)
oracle@coltdb01 bin]$ ./dbca
- It will automatically detect that you use ASM and will configure your ASM instance and then your database instance.
- Your Listener will also will be configured in your ASM home

7. Verify using crs_stat -t

Thursday, 14 June 2012

CRSD,CSSD and EVMD daemons are not starting after system eboot

Environment
--------------

OS - RHEL 5.7
Cluster version - 10.2.0.5

Steps to check
-------------

1. The prerequisite check with the 'init.cssd startcheck' execution

The clusterware scripts further run a check script to know whether the clusterware is startable before launching the clusterware processes, that is, to check whether basic prerequisites are met and permit the clusterware to start :

[root@coltdb03 init.d]# /etc/init.d/init.cssd startcheck
If the command hangs, go to /tmp/crsctl.XXXX file to check the error

[root@coltdb03 tmp]# cat crsctl.4315
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]

In this case the permission for OCR files where missing. so after grating the required permission and privileges, execute the init.cssd startcheck command once again.This time it should return to command prompt immediately

Wednesday, 13 June 2012

REINSTALL CRS ON A RAC ENVIRONMENT

If your crs is corrupted and you want to only re-install it without affecting your database, you can use the following steps

Environment
-----------
Nodes = 2
OS Version = RHEL 5.7
Clusterware Version = 10.2.0.5
DB Version = 10.2.0.5

Steps
------

1. On both the Nodes clean RAC init scripts

Linux:
rm -rf /etc/oracle/*
rm -f /etc/init.d/init.cssd
rm -f /etc/init.d/init.crs
rm -f /etc/init.d/init.crsd
rm -f /etc/init.d/init.evmd
rm -f /etc/rc2.d/K96init.crs
rm -f /etc/rc2.d/S96init.crs
rm -f /etc/rc3.d/K96init.crs
rm -f /etc/rc3.d/S96init.crs
rm -f /etc/rc5.d/K96init.crs
rm -f /etc/rc5.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -f /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab

2. Kill all the crsd,evmd and cssd processes on both nodes using kill -9 command

ps -ef | grep crs
ps -ef | grep evmd
ps -ef | grep cssd

3. Remove the files in /var/tmp/.oracle/ location
rm -rf /var/tmp/.oracle/

4. Remove the file in /etc/oracle/ocr.loc
rm-rf /etc/oracle/ocr.loc

5. De-install the CRS home using Oracle universe installer
- You can ignore this step, if you don't find the CRS installation in Universal Installer "Installed Products"
- If you can't un-install just remove the CRS directory
rm -rf /u01/crs/oracle/product/10.2.0/crs

6. Clean out OCR and Voting disk from one node

- Check on /etc/sysconfig/rawdevices to identify your OCR and Voting disk partition
/dev/raw/raw1 /dev/sdb1
/dev/raw/raw2 /dev/sdb2

/dev/raw/raw3 /dev/sdb8
/dev/raw/raw4 /dev/sdb9
/dev/raw/raw5 /dev/sdb11

- You can use the fdisk command to delete and add the partition of ocr and voting disk
- The partition number gets changed after we try to delete and add the partition in logical group, so make a note of it and modify your /etc/sysconfig/rawdevices accordingly
- If the partion numbers are changed, use "oracleasm scandisk" on both the nodes, to make the ASM disk visible on both the nodes
- Make sure you have correct permission for Voting disk and OCR file

To Delete a partition
--------------------
[root@coltdb01 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 48829.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 51.2 GB, 51200917504 bytes
64 heads, 32 sectors/track, 48829 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 239 244720 83 Linux
/dev/sdb2 240 478 244736 83 Linux
/dev/sdb4 479 48829 49511424 5 Extended
/dev/sdb5 10734 20271 9766896 83 Linux
/dev/sdb6 20272 29809 9766896 83 Linux
/dev/sdb7 29810 39347 9766896 83 Linux
/dev/sdb8 479 717 244720 83 Linux
/dev/sdb9 718 956 244720 83 Linux
/dev/sdb10 1196 10733 9766896 83 Linux
/dev/sdb11 957 1195 244720 83 Linux

Partition table entries are not in disk order

Command (m for help): d
Partition number (1-11): 1

To Add a partition
-------------------------
[root@coltdb01 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 48829.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 51.2 GB, 51200917504 bytes
64 heads, 32 sectors/track, 48829 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 239 244720 83 Linux
/dev/sdb2 240 478 244736 83 Linux
/dev/sdb4 479 48829 49511424 5 Extended
/dev/sdb5 10734 20271 9766896 83 Linux
/dev/sdb6 20272 29809 9766896 83 Linux
/dev/sdb7 29810 39347 9766896 83 Linux
/dev/sdb8 479 717 244720 83 Linux
/dev/sdb9 718 956 244720 83 Linux
/dev/sdb10 1196 10733 9766896 83 Linux
/dev/sdb11 957 1195 244720 83 Linux

Partition table entries are not in disk order

Command (m for help): n
Command action
l logical (5 or over)
p primary partition (1-4)

7.Check that Virtual IP's are down on both the nodes
- If vip's remained up the need to be removed using : ifconfig <device> down

8. Reinstall the CRS from node 1
- After the install of clusterware 10.2.0.1, the services can't be brought online, since the DB and ASM instance version is 10.2.0.5. You have to apply the patchset 10.2.0.5 to clusterware to
bring up all the services online

9. Run crs_stat -t to check whether all the services and instances are up

Tuesday, 15 May 2012

OUI-10009 Error while adding Node

OUI-10009: There are no new nodes to add to this installation

Ver - 10.2.0.5
Platform - RHEL

Issue :

While adding a ASM node we receive the following error

cd $ASM_HOME/oui/bin

./addNodes.sh

OUI-10009: There are no new nodes to add to this installation

Solution :

Add the ASM Software in the Silent Mode

./addNode.sh -silent "CLUSTER_NEW_NODES={coltdb02}" -logLevel trace -debug