How To Recover From Corrupted OCR Disk
How To Recover From Corrupted OCR Disk
It is very common where a DBA is left with corrupted OCR disk without having any good backup.
The same situation was experienced by me few days back. One node of RAC database shows the following:
NODE1:
$ORA_CRS_HOME/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora.orcl.db application ONLINE ONLINE raclinux1
ora….11.inst application ONLINE ONLINE raclinux1
ora….12.inst application ONLINE OFFLINE
ora….vice.cs application OFFLINE OFFLINE
ora….l11.srv application ONLINE OFFLINE
ora….l12.srv application ONLINE OFFLINE
ora….SM1.asm application ONLINE ONLINE raclinux1
ora….DC.lsnr application ONLINE ONLINE raclinux1
ora….abc.gsd application ONLINE ONLINE raclinux1
ora….abc.ons application ONLINE ONLINE raclinux1
ora….abc.vip application ONLINE ONLINE raclinux1
ora….SM2.asm application ONLINE ONLINE raclinux2
ora….C2.lsnr application ONLINE ONLINE raclinux2
ora….bc2.gsd application ONLINE ONLINE raclinux2
ora….bc2.ons application ONLINE ONLINE raclinux2
ora….bc2.vip application ONLINE ONLINE raclinux2
The other node shows the following:
NODE2:
/crs_stat -t
HA Resource Target State
———– —— —–
ora.orcl.db OFFLINE OFFLINE
ora.orcl.racdb1.inst OFFLINE OFFLINE
ora.orcl.racdb2.inst OFFLINE OFFLINE
ora.orcl.test_service.cs ONLINE OFFLINE
ora.orcl.test_service.racdb1.srv OFFLINE OFFLINE
ora.orcl.test_service.racdb2.srv OFFLINE OFFLINE
ora.raclinux1 .ASM1.asm OFFLINE OFFLINE
ora.raclinux1 .LISTENER_RAC1 .lsnr OFFLINE OFFLINE
ora.raclinux1 .gsd OFFLINE OFFLINE
ora.raclinux1 .ons OFFLINE OFFLINE
ora.raclinux1 .vip OFFLINE OFFLINE
ora.raclinux2.ASM2.asm OFFLINE OFFLINE
ora.raclinux2.LISTENER_RAC2 2.lsnr ONLINE OFFLINE
ora.raclinux2.gsd ONLINE OFFLINE
ora.raclinux2.ons ONLINE OFFLINE
ora.raclinux2.vip ONLINE OFFLINE
We can see the inconsistent data across two node RAC. Every command for srvctl, crsctl was hanging on NODE 2.
Now the option is to restore the OCR backup, but if there is no backup available for OCR then we can use the following procedure to recover from corrupted OCR disk
(There will be complete downtime needed to perform these operations)
1. Check the status of CRS from node 1:
# ps -eaf |grep d.bin
root 12873 1 0 Aug11 ? 00:11:07 /u01/app/crs/bin/crsd.bin reboot
oracle 13105 12846 0 Aug11 ? 00:00:45 /u01/app/crs/bin/evmd.bin
oracle 13226 13200 0 Aug11 ? 00:13:13 /u01/app/crs/bin/ocssd.bin
root 21458 19986 0 20:34 pts/4 00:00:00 grep d.bin
2. Shutdown Oracle ClusterWare on all nodes:
[root@raclinux1 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Check the status again:
[root@raclinux1 bin]# ps -eaf |grep d.bin
root 21927 19986 0 20:34 pts/4 00:00:00 grep d.bin
It shows that the cluster is stopped.
3. Execute rootdelete.sh from all nodes.
It is under directory $ORA_CRS_HOME/install/rootdelete.sh
NODE1:
[root@raclinux1 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’
NODE 2:
./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’
“OCR initialization failed accessing OCR device”, this error can occur due to folloing reasons:
1. ocrconfig_loc is not pointing to the correct ocr.
2. Problem of rights and owners on the ocr devices
3. Configuration problem on Oracle Cluster Synchronization Services
As the SCR entries are cleaned up so there is no need to worry about PROC-26 error.
If you have more than 2 nodes in a rac you need to run rootdelete.sh on all the other nodes also.
4. Run rootdeinstall.sh from the node where the RAC installation was done (usually it is the node1).
It will clear up the OCR disk contents.
./rootdeinstall.sh
Removing contents from OCR device
2560+0 records in
2560+0 records out
5. Run root.sh from the same node:
./root.sh
WARNING: directory ‘/u01’ is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01’ is not owned by root
assigning default hostname raclinux1 for node 1.
assigning default hostname raclinux2 2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: raclinux1 raclinux1-priv raclinux1
node 2: raclinux2 raclinux2-priv raclinux2
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
Now formatting voting device: /dev/raw/raw1
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
raclinux1
CSS is inactive on these nodes.
raclinux2 2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
After its completion run root.sh on all remaining nodes.
./root.sh
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname raclinux1 for node 1.
assigning default hostname raclinux2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: raclinux1 raclinux1-priv raclinux1
node 2: raclinux2 raclinux2-priv raclinux2
clscfg: Arguments check out successfully.
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
raclinux1
raclinux2
CSS is active on all nodes.
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), “eth0” is not public. Public interfaces should be used to configure virtual IPs.
The silent mode VIPCA configuration will fail because of BUG 4437727 in 10.2.0.1. To solve this run the
VIPCA manually from root user from last node where this error has occured and follow the instructions.
# $ORA_CRS_HOME/bin/vipca
6. Now final step is to add the resources back to OCR with srvctl command.
Adding DATABASE to OCR:
$srvctl add database -d db_unique_name -o oracle_home
[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add database -d orcl -o /u01/app/oracle/product/10.2.0/db_1
Adding INSTANCE to OCR:
srvctl add instance -d db_unique_name -i inst_name -n node_name
[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i racdb1 -n raclinux1
[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add instance -d orcl -i racdb2 -n raclinux2 2
Adding SERVICES to OCR:
$srvctl add service -d db_unique_name -s service_name -r preferred_list
[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add service -d orcl -s test_service -r racdb1,racdb2
Adding NODEAPPS to OCR:
srvctl add nodeapps -n node_name -o oracle_home -A addr_str
Where addr_str= The node level VIP address
This command needs to be run from ROOT user otherwise you will get following error:
[oracle@raclinux1 ~]$ $ORA_CRS_HOME/bin/srvctl add nodeapps -n raclinux1 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0
PRKO-2117 : This command should be executed as the system privilege user.
[oracle@raclinux1 ~]$
[oracle@raclinux1 ~]$ su –
Password:
[root@raclinux1 ~]# cd /u01/app/crs/bin
[root@raclinux1 bin]# ./srvctl add nodeapps -n raclinux1 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.87/255.255.255.0
[root@raclinux1 bin]#./srvctl add nodeapps -n raclinux2 2 -o /u01/app/oracle/product/10.2.0/db_1 -A 10.167.21.89/255.255.255.0
This will complete the OCR recreation, now you can test the status with cluvfy.
No comments yet.
-
Archives
- February 2017 (1)
- November 2016 (1)
- October 2016 (1)
- May 2016 (2)
- March 2016 (3)
- December 2014 (2)
- July 2014 (1)
- June 2014 (6)
- May 2014 (5)
- February 2014 (1)
- December 2012 (2)
- November 2012 (8)
-
Categories
-
RSS
Entries RSS
Comments RSS
Leave a Reply