Guenadi N Jilevski's Oracle BLOG

Oracle RAC, DG, EBS, DR and HA DBA BLOG

How To Recover From Corrupted OCR Disk

How To Recover From Corrupted OCR Disk

It is very common where a DBA is left with corrupted OCR disk without having any good backup.
The same situation was experienced by me few days back. One node of RAC database shows the following:

NODE1:

$ORA_CRS_HOME/bin/crs_stat -t

Name           Type           Target    State     Host

————————————————————

ora.orcl.db    application    ONLINE    ONLINE    raclinux1

ora….11.inst application    ONLINE    ONLINE    raclinux1

ora….12.inst application    ONLINE    OFFLINE

ora….vice.cs application    OFFLINE   OFFLINE

ora….l11.srv application    ONLINE    OFFLINE

ora….l12.srv application    ONLINE    OFFLINE

ora….SM1.asm application    ONLINE    ONLINE    raclinux1

ora….DC.lsnr application    ONLINE    ONLINE    raclinux1

ora….abc.gsd application    ONLINE    ONLINE    raclinux1

ora….abc.ons application    ONLINE    ONLINE    raclinux1

ora….abc.vip application    ONLINE    ONLINE    raclinux1

ora….SM2.asm application    ONLINE    ONLINE    raclinux2

ora….C2.lsnr application    ONLINE    ONLINE    raclinux2

ora….bc2.gsd application    ONLINE    ONLINE    raclinux2

ora….bc2.ons application    ONLINE    ONLINE    raclinux2

ora….bc2.vip application    ONLINE    ONLINE    raclinux2

The other node shows the following:
NODE2:

/crs_stat -t

HA Resource                                   Target     State

———–                                   ——     —–

ora.orcl.db                                   OFFLINE    OFFLINE

ora.orcl.racdb1.inst                          OFFLINE    OFFLINE

ora.orcl.racdb2.inst                          OFFLINE    OFFLINE

ora.orcl.test_service.cs                      ONLINE     OFFLINE

ora.orcl.test_service.racdb1.srv              OFFLINE    OFFLINE

ora.orcl.test_service.racdb2.srv              OFFLINE    OFFLINE

ora.raclinux1 .ASM1.asm                         OFFLINE    OFFLINE

ora.raclinux1 .LISTENER_RAC1 .lsnr           OFFLINE    OFFLINE

ora.raclinux1 .gsd                              OFFLINE    OFFLINE

ora.raclinux1 .ons                              OFFLINE    OFFLINE

ora.raclinux1 .vip                              OFFLINE    OFFLINE

ora.raclinux2.ASM2.asm                        OFFLINE    OFFLINE

ora.raclinux2.LISTENER_RAC2 2.lsnr         ONLINE     OFFLINE

ora.raclinux2.gsd                             ONLINE     OFFLINE

ora.raclinux2.ons                             ONLINE     OFFLINE

ora.raclinux2.vip                             ONLINE     OFFLINE

We can see the inconsistent data across two node RAC. Every command for srvctl, crsctl was hanging on NODE 2.
Now the option is to restore the OCR backup, but if there is no backup available for OCR then we can use the following procedure to recover from corrupted OCR disk
(There will be complete downtime needed to perform these operations)
1. Check the status of CRS from node 1:

# ps -eaf |grep d.bin
root 12873 1 0 Aug11 ? 00:11:07 /u01/app/crs/bin/crsd.bin reboot
oracle 13105 12846 0 Aug11 ? 00:00:45 /u01/app/crs/bin/evmd.bin
oracle 13226 13200 0 Aug11 ? 00:13:13 /u01/app/crs/bin/ocssd.bin
root 21458 19986 0 20:34 pts/4 00:00:00 grep d.bin

2. Shutdown Oracle ClusterWare on all nodes:

[root@raclinux1  bin]# ./crsctl stop crs

Stopping resources.

Successfully stopped CRS resources

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

Check the status again:

[root@raclinux1 bin]# ps -eaf |grep d.bin
root 21927 19986 0 20:34 pts/4 00:00:00 grep d.bin

It shows that the cluster is stopped.

3. Execute rootdelete.sh from all nodes.

It is under directory $ORA_CRS_HOME/install/rootdelete.sh

NODE1:

[root@raclinux1  install]# ./rootdelete.sh

Shutting down Oracle Cluster Ready Services (CRS):

Stopping resources.

Error while stopping resources. Possible cause: CRSD is down.

Stopping CSSD.

Unable to communicate with the CSS daemon.

Shutdown has begun. The daemons should exit soon.

Checking to see if Oracle CRS stack is down…

Oracle CRS stack is not running.

Oracle CRS stack is down now.

Removing script for Oracle Cluster Ready services

Updating ocr file for downgrade

Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

NODE 2:

./rootdelete.sh

Shutting down Oracle Cluster Ready Services (CRS):

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

Shutdown has begun. The daemons should exit soon.

Checking to see if Oracle CRS stack is down…

Oracle CRS stack is not running.

Oracle CRS stack is down now.

Removing script for Oracle Cluster Ready services

Updating ocr file for downgrade

Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

“OCR initialization failed accessing OCR device”, this error can occur due to folloing reasons:
1. ocrconfig_loc is not pointing to the correct ocr.
2. Problem of rights and owners on the ocr devices
3. Configuration problem on Oracle Cluster Synchronization Services

As the SCR entries are cleaned up so there is no need to worry about PROC-26 error.

If you have more than 2 nodes in a rac you need to run rootdelete.sh on all the other nodes also.

4. Run rootdeinstall.sh from the node where the RAC installation was done (usually it is the node1).
It will clear up the OCR disk contents.

./rootdeinstall.sh

Removing contents from OCR device

2560+0 records in

2560+0 records out

5. Run root.sh from the same node:

./root.sh

WARNING: directory ‘/u01’ is not owned by root

Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

WARNING: directory ‘/u01’ is not owned by root

assigning default hostname raclinux1  for node 1.

assigning default hostname raclinux2 2 for node 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node :

node 1: raclinux1  raclinux1-priv raclinux1

node 2: raclinux2  raclinux2-priv raclinux2

Creating OCR keys for user ‘root’, privgrp ‘root’..

Operation successful.

Now formatting voting device: /dev/raw/raw1

Format of 1 voting devices complete.

Startup will be queued to init within 90 seconds.

Adding daemons to inittab

Expecting the CRS daemons to be up within 600 seconds.

CSS is active on these nodes.

raclinux1

CSS is inactive on these nodes.

raclinux2 2

Local node checking complete.

Run root.sh on remaining nodes to start CRS daemons.

After its completion run root.sh on all remaining nodes.

./root.sh

Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

assigning default hostname raclinux1  for node 1.

assigning default hostname raclinux2  for node 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node :

node 1: raclinux1  raclinux1-priv raclinux1

node 2: raclinux2  raclinux2-priv raclinux2

clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.

-force is destructive and will destroy any previous cluster

configuration.

Oracle Cluster Registry for cluster has already been initialized

Startup will be queued to init within 90 seconds.

Adding daemons to inittab

Expecting the CRS daemons to be up within 600 seconds.

CSS is active on these nodes.

raclinux1

raclinux2

CSS is active on all nodes.

Oracle CRS stack installed and running under init(1M)

Running vipca(silent) for configuring nodeapps

The given interface(s), “eth0” is not public. Public interfaces should be used to configure virtual IPs.

The silent mode VIPCA configuration will fail because of BUG 4437727 in 10.2.0.1. To solve this run the
VIPCA manually from root user from last node where this error has occured and follow the instructions.
# $ORA_CRS_HOME/bin/vipca

6. Now final step is to add the resources back to OCR with srvctl command.

Adding DATABASE to OCR:

$srvctl add database -d db_unique_name -o oracle_home

[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add database -d orcl -o /u01/app/oracle/product/10.2.0/db_1

Adding INSTANCE to OCR:

srvctl add instance -d db_unique_name -i inst_name -n node_name

[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i racdb1 -n raclinux1

[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i racdb2 -n raclinux2 2

Adding SERVICES to OCR:

$srvctl add service -d db_unique_name -s service_name -r preferred_list

[oracle@raclinux1  ~]$ $ORA_CRS_HOME/bin/srvctl add service -d orcl -s test_service -r racdb1,racdb2

Adding NODEAPPS to OCR:

srvctl add nodeapps -n node_name -o oracle_home -A addr_str
Where addr_str= The node level VIP address
This command needs to be run from ROOT user otherwise you will get following error:

[oracle@raclinux1  ~]$  $ORA_CRS_HOME/bin/srvctl add nodeapps -n raclinux1  -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0

PRKO-2117 : This command should be executed as the system privilege user.

[oracle@raclinux1  ~]$

[oracle@raclinux1  ~]$ su –

Password:

[root@raclinux1  ~]# cd /u01/app/crs/bin

[root@raclinux1  bin]# ./srvctl add nodeapps -n raclinux1  -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.87/255.255.255.0

[root@raclinux1  bin]#./srvctl add nodeapps -n raclinux2 2  -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0

This will complete the OCR recreation, now you can test the status with cluvfy.

July 7, 2008 - Posted by | oracle

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: