Sunday, August 17, 2008

RAC TOP COMMANDS


RAC TOP COMMANDS:
################

srvctl: For Database and Instances:
####

Start / Stop a RAC database:
----------------------------
srvctl start database -d  database_name -o open

srvctl stop database -d 
database_name  -o immediate


Start / Stop a RAC instance:
----------------------------
srvctl start instance-d  database_name  -i instance_name-o open

srvctl stop instance-d 
database_name  -i instance_name -o immediate

Start/stop nodeapps:
-------------------
srvctl start nodeapps -n node_name

srvctl stop nodeapps -n node_name


To add a database/instance to RAC cluster:
--------------------------------------
srvctl add database -d 
 database_name -o <ORACLE_HOME>
srvctl add instance -d  database_name  -i  instance_name -n  node_name

Start / Stop the Listener:
-------------------------
srvctl start listener –l listener_name –n 
node_name 

srvctl stop listener –l listener_name –n node_name 

To add a node:
-------------
srvctl add nodeapps -n
 node_name   -A VIP_NAME | IP

Start/stop asm:
--------------
srvctl start asm -n  node_name 

srvctl stop asm -n  node_name 

To prevent a database from starting at boot time:
------------------------------------------------
srvctl disable database -d database_name 


CRS RESOURCE STATUS:
-------------------
srvctl status service -d database_name 


CRSCTL: Resources and Nodes:
######

To Stop all RAC resources on the node you step on it: (By root user)
-----------------------------------------------------
crsctl stop crs

To Start all RAC resources:(By root user)
--------------------------
crsctl start crs

Check RAC status:
----------------
crs_stat -t

Crs healtcheck:
--------------
crsctl check crs

Clusterware version:
-------------------
crsctl query crs softwareversion
crsctl query crs activeversion



Prevent the CRS from starting at boot time:
=============================
crsctl stop crs --> (On the failing node) will stop CRS.

crsctl disable crs  -->(On the failing node) Will disable crs from starting next reboot.

After you fix the problem re-enable the CRS on the node to let it start after rebooting the OS:

# crsctl enable crs



Voting disks:
##########
Voting Disks are for Disk Heartbeat, which are essential in the detection and resolution of cluster "split brain"
situation.

Backing up Votedisks:
------------------------
In 10g this can be done while the CRS is running: 10g
================================
# dd if=voting_disk_name  of=backup_file_name

In 11g you must shutdown the CRS: 11g
========================
# crsctl stop crs (On all nodes)

# dd if=voting_disk_name  of=backup_file_name

Note: Don't use copy command "cp" use "dd" command only.

when to backup votedisks:
=================
You do not have to back up the voting disk every day. Back up only in the following cases:

-After RAC installation.
-After add or delete a node on the cluster.
-After adding or removing a votedisk using CRSCTL command.

Note: 11gR2 Voting disks contents are backed up automatically in OCR, you're not required to manually backup the Voting disks.

Check Voting Disk:
------------------
# crsctl query css votedisk

Restore votedisks:  (By root user)
---------------------

Case of losing all of votedisks:
====================
1-Shutdown CRS: (On all Nodes)
- ---------------
# crsctl stop crs

2-Locate the current location of the Votedisks:
- -----------------------------------------
# crsctl query css votedisk

3-Restore all votedisks from a previous good backup taken by "dd" command: (On One node only)
- -----------------------------------------------------------------------
# dd if=Votedisk_backup_file  of=Votedisk_file   <<-- do this for all the votedisks.

4-Start CRS: (On all Nodes)
- ------------
# crsctl start crs


Case of losing ONE voting disk:
==================

1-Start the clusterware in exclusive mode: (On One node only)
- --------------------------------------------------------
# crsctl start crs -excl

2-Retrieve the list of voting disks currently defined: -if found-
- ---------------------------------------------------
# crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ON     938e3a4fd41a4f5bbf8d0d5c676aca9a ( /oracle/ORCL/voting/voting_01.dbf) []
 2. ON       99f4b4f7d65f4f05bf237ad06c52b65a ( /oracle/ORCL/voting/voting_02.dbf) []
 3. OFF      0578603d0f6b4f21bfb2eb22ae82f00d ( /oracle/ORCL/voting/voting_03.dbf) []

This list may be empty if all voting disks were corrupted, or "STATE" will be "3" or "OFF".

3-Delete the corrupted voting disks:
- ------------------------------
# crsctl delete css votedisk  /oracle/ORCL/voting/voting_03.dbf

Note: You can also use the "File Universal Id" instead of the full path:

Note:
=It is not recommended to use "-force" attribute to add or delete a voting disk while the Clusterware is
running.This is known to corrupt the OCR (no errors will appear but will cause node eviction).
=The "-force" attribute can be safely used ONLY if the Clusterware is stopped on all the nodes of the cluster.

4- Add the voting disks again:
- ------------------------
First: touch the corrupted file:
# touch  /oracle/ORCL/voting/voting_03.dbf

Second: Add the touched file to votedisk list:
# crsctl add css votedisk /oracle/ORCL/voting/voting_03.dbf

Note: You can copy a good votedisk to the corrupted one, you can use links to backup locations to save time.

Restart the clusterware:
-----------------------
# crsctl stop crs -f   --> -f because we started it in exclusive mode.
# crsctl start crs     --> On both Nodes.


OCR disks:
#########
OCR disks hold the clusterware configuration information (Nodes info, registered resources,databases,instances ,listeners,services,.....etc), It's somehow similar to the "Registry" in Windows OS.

Checking OCR disks:(By root user)
-----------------------
# ocrcheck


Backup OCR:
-------------
Oracle automatically backup the OCR files every 4 hours, you can check these backup files location by using this command:

# ocrconfig -showbackup

Backup OCR to export file:(Logical backup) (By root user)
------------------------------------
# ocrconfig -export Export_File_Name


Restore OCR Disks:
--------------------

Restore OCR from automatic backups being taken every 4 hours:  (By root user)
------------------------------------------------------------
# crsctl stop crs  -> On all RAC nodes.

# ocrconfig -restore /CRS_HOME/cdata/CLUSTER_NAME/xxxx.ocr   -> From one node only.

# crsctl start crs   -> On all RAC nodes.

Restore OCR from export file been taken manually using "ocrconfig -export" command:  (By root user)
----------------------------------------------------------------------------------
# crsctl stop crs  -> On all RAC nodes.

# ocrconfig -import /backupdisk/xxxx.dmp  -> On one RAC node only.

# crsctl start crs ->On all RAC nodes.




Miscellaneous:
########

Check If a database is RAC or not:
========================

SQL> show parameter CLUSTER_DATABASE;

OR:
--

SQL> set serveroutput on;
SQL> BEGIN
IF dbms_utility.is_cluster_database THEN
dbms_output.put_line('Running in SHARED/RAC mode.');
ELSE
dbms_output.put_line('Running in EXCLUSIVE mode.');
END IF;
END;
/

Check the active instance and it's Host:
==========================

SQL> SELECT * FROM SYS.V_$ACTIVE_INSTANCES;
SQL> SELECT * FROM SYS.V_$THREAD;




Standby Database Switchover | Failover

Switch Over:

Switch over means to switch the roles between the Primary DB and the Standby DB, so after the Switch over the Primary DB will act as the standby DB, and the Standby DB will act as the Primary DB.
This approach is usually being used during the DR drills, where the standby database will be opened in READ/WRITE mode, to allow production applications to use it during the drill. Once the drill is done, the DBA can switch back the roles between both sites to get back to the original status, without the need to recreate any of the primary or standby databases.

Before the switch over: Do the following checking
---------------------------


1= Verify that the last sequence# received and the last sequence# already applied on the standby database:


SQL> select max(al.sequence#) "Last Received",max(lh.sequence#) "Last Applied" from v$archived_log al, v$log_history lh;

If the two numbers are the same then the standby has applied all the archives been sent from the primary site.

If not, follow these steps:
------------------------------

2= Verify that the standby is in mount state:

SQL> select open_mode from v$database;

3= Determine if there is any archive gap on the physical standby database:

SQL> select * from v$archive_gap;

4= Verify that the managed recovery process is running:

SQL> select process,status from v$managed_standby;

-If you do not see MRP process this mean the managed recovery not working so start it by:

SQL> recover managed standby database disconnect;



On the Primary DB:
-----------------------
SQL> select switchover_status from v$database;

STATUS should be ‘SESSIONS ACTIVE’ or ‘TO STANDBY’

On the Standby DB:
-----------------------
SQL> select switchover_status from v$database;

STATUS should be ‘SESSIONS ACTIVE’ or ‘TO PRIMARY’


Switch over Steps:   
=============
1- Apply all pending archives on the Standby DB:

-   --------------------------------------------------------

>Disconnect all sessions.
>Stop REDO apply:
> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;
> For RAC:
> Stop all instance except single instance.
> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH FORCE; --Do not wait for the network.


2-  On the Primary DB:

-    ------------------------
> Stop all RAC instance except one instance:
> Disconnect all sessions...

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown;

3-  On the Standby DB:
-    ------------------------
=> Ensure that all archives has been applied.

=> Ensure there is no connected sessions on the Standby DB.....

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

4-  On the old Primary DB: (New Standby)
-    ----------------------------
Disable all archive processes that send the archives to the old standby:
SQL> ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;

* Check if any of the other archive dest parameters is enabled e.g. log_archive_dest_state_3 ,4 ,5 ... 

5-  On the old Standby DB: (New Primary)
-    -----------------------------
Enable the log_archive_dest_2 to transfer the archives to the old primary (new standby):

SQL> Alter system set log_archive_dest_2='service="PRIMARY" db_unique_name=PRIMARY compression=enable' scope=both;
SQL> ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;

SQL> Select DATABASE_ROLE,OPEN_MODE,PROTECTION_MODE from v$database;

DATABASE_ROLE OPEN_MODE  PROTECTION_MODE
---------------- ---------- --------------------
PRIMARY MOUNTED    MAXIMUM PERFORMANCE

SQL> ALTER DATABASE OPEN;

SQL> Alter system set standby_file_management='AUTO';


6-  On the old Primary DB: (New Standby)

-    ----------------------------
Restart the database in mount mode to act as a standby DB and start to apply the shipped archives from the old standby (new primary).

SQL> SHUTDOWN IMMEDIATE;
SQL> startup NOMOUNT;
SQL> alter database MOUNT standby database;
SQL> recover managed standby database disconnect;


7-  On the old Standby DB: (New Primary)
-    -----------------------------

- If it's a RAC setup, start up all instances.
- Modify the connection string inside application config files to point to the old standby DB (new primary)


Switch Back:  From New Primary to the new DR
=========
On old DR:
-----------
SQL> ALTER SYSTEM SWITCH LOGFILE;

SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN;

On old Primary: 
---------------
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

On old DR: Disbale sending archives to old Primary
-----------
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;

On old Primary: 
---------------
SQL> ALTER SYSTEM SET log_archive_dest_state_1='ENABLE' SCOPE=BOTH;
SQL> Select DATABASE_ROLE,OPEN_MODE,PROTECTION_MODE from v$database;

DATABASE_ROLE OPEN_MODE  PROTECTION_MODE
---------------- ---------- --------------------
PRIMARY MOUNTED    MAXIMUM PERFORMANCE

SQL> ALTER DATABASE OPEN;

On DR:
-------
SQL>SHUTDOWN IMMEDIATE;
SQL>startup NOMOUNT;
SQL>alter database MOUNT standby database;
SQL>recover managed standby database disconnect;
SQL>Select DATABASE_ROLE,OPEN_MODE,PROTECTION_MODE from v$database;


Fail Over:

The FAIL OVER approach is used in the real disaster scenarios where the primary site is not exist/accessible any more, because of flood, earthquake, war, power failure, or whatever the catastrophe caused in the primary site.

All the failover steps will be done on the DR site.

Normal Fail over:
============
On the Standby DB:
------------------------
Terminate the managed recovery mode:
-----------------------------------------------
SQL>recover managed standby database nodelay;

SQL>alter database recover managed standby database finish;

Open the Standby database in READ/WRITE mode to act as a primary DB:
-------------------------------------------------------------------------------------------
SQL>alter database commit to switchover to primary;

SQL>alter database open;


Forced Fail over:
============
SQL>alter database recover managed standby database finish force;

SQL>alter database activate standby database [skip standby logfile];

SQL>shutdown immediate;

SQL>startup mount;

SQL>alter database open resetlogs;