Skip Headers
Oracle® Database 2 Day + Real Application Clusters Guide
11g Release 2 (11.2)

Part Number E10743-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

5 Administering Oracle Clusterware Components

This chapter describes how to administer your Oracle Clusterware environment. It describes how to administer the voting disks and the Oracle Cluster Registry (OCR) in the following sections:

About Oracle Clusterware

Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.

Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.

Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership. The OCR is a file that contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for resources that you have customized.

Each node in a cluster also has its own local OCR, called an Oracle Local Registry (OLR), that is created when Oracle Clusterware is installed. Multiple processes on each node have simultaneous read and write access to the OLR particular to the node on which they reside, whether or not Oracle Clusterware is fully functional. By default, OLR is located at Grid_home/cdata/$HOSTNAME.olr

The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.

Adding and Removing Voting Disks

You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path is the fully qualified path for the additional voting disk.

To add or remove a voting disk:

  1. Run the following command as the root user to add a voting disk:

    crsctl add css votedisk path
    
  2. Run the following command as the root user to remove a voting disk:

    crsctl delete css votedisk path
    

Note:

If your cluster is down, then you can use the -force option to add a voting disk without interacting with active Oracle Clusterware daemons. However, you may corrupt your cluster configuration if you use the -force option while a cluster node is active.

Backing Up and Recovering Voting Disks

High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to one of the redundant components. However, if a disaster strikes, or a massive hardware failure occurs, having redundant components might not be enough. To fully protect your system it is important to have backups of your critical files.

About Backing Up and Recovering Voting Disks

The voting disk records node membership information. A node must be able to access more than half of the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.

For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.

If you lose all copies of the voting disk and do not have a backup, the only safe way to re-create the voting disk is to reinstall Oracle Clusterware. Having a backup of the voting disk can drastically simplify the recovery of your system.

Backing Up Voting Disks

The voting disk files are backed up automatically by Oracle Clusterware if the contents of the files have changed in the following ways:

  • Configuration parameters, for example misscount, have been added or modified

  • After performing voting disk add or delete operations

Recovering Voting Disks

If a voting disk is damaged, and no longer usable by Oracle Clusterware, you can recreate the voting disk. The voting disk contents are restored from a backup when a new voting file is added; this occurs regardless of whether or not the voting disk file is stored in Oracle Automatic Storage Management (Oracle ASM). If you need to replace a corrupt, damaged, or missing voting disk, then use CRSCTL to first delete the voting disk and then create a new voting disk in the same location.

Restoring a voting disk from a copy created with the operating system dd command is not supported.

Backing Up and Recovering the Oracle Cluster Registry

Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old.

You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides.

This section contains the following topics:

Viewing Available OCR Backups

Use the ocrconfig utility to view the backups generated automatically by Oracle Clusterware.

To find the most recent backup of the OCR:

Run the following command on any node in the cluster:

ocrconfig -showbackup

Manually Backing Up the OCR

Use the ocrconfig utility to force Oracle Clusterware to perform a backup of OCR at any time, rather than wait for the automatic backup that occurs at 4-hour intervals. This option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR.

To manually backup the contents of the OCR:

  1. Log in as the root user.

  2. Use the following command to force Oracle Clusterware to perform an immediate backup of the OCR:

    ocrconfig -manualbackup
    

    The date and identifier of the recently generated OCR backup is displayed.

  3. (Optional) If you need to change the location for the OCR backup files, use the following command, where directory_name is the new location for the backups:

    ocrconfig -backuploc directory_name
    

The default location for generating backups on Oracle Enterprise Linux systems is Grid_home/cdata/cluster_name where cluster_name is the name of your cluster and Grid_home is the home directory of your Oracle grid infrastructure software. Because the default backup is on a local file system, Oracle recommends that you include the backup file created with the ocrconfig command as part of your operating system backup using standard operating system or third-party tools.

Tip:

You can use the ocrconfig -backuploc command to move the location where the OCR backups are created

Recovering the OCR

There are two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

This section contains the following topics:

Checking the Status of the OCR

In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable.

To check the status of the OCR:

  1. Run the following command:

    ocrcheck 
    
  2. If this command does not display the message 'Device/File integrity check succeeded' for at least one copy of the OCR, then all copies of the OCR have failed. You must restore the OCR from a backup or OCR export.

  3. If there is at least one copy of the OCR available, you can use that copy to restore the other copies of the OCR.

Restoring the OCR from Automatically Generated OCR Backups

When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.

To restore the OCR from an automatically generated backup on an Oracle Enterprise Linux system:

  1. Log in as the root user.

  2. Identify the available OCR backups using the ocrconfig command:

    [root]# ocrconfig -showbackup
    
  3. Review the contents of the backup using the following ocrdump command, where file_name is the name of the OCR backup file for which the contents should be written out to the file ocr_dump_output_file:

    [root]# ocrdump ocr_dump_output_file -backupfile file_name
    

    If you do not specify an output file name, then the OCR contents are written to a file named OCRDUMPFILE in the current directory.

  4. As the root user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:

    [root]# crsctl stop cluster -all
    
  5. As the root user, restore the OCR by applying an OCR backup file that you identified in Step 1 using the following command, where file_name is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.

    [root]# ocrconfig -restore file_name
    
  6. As the root user, restart Oracle Clusterware on all the nodes in your cluster by running the following command:

    [root]# crsctl start cluster -all
    
  7. Use the Cluster Verification Utility (CVU) to verify the OCR integrity. Exit the root user account, and as the software owner of the Oracle grid infrastructure run the following command, where the -n all argument retrieves a list of all the cluster nodes that are configured as part of your cluster:

    cluvfy comp ocr -n all [-verbose]
    

Maintaining the Oracle Local Registry

You maintain the Oracle Local Registry using the OCRCHECK, OCRDUMP, and OCRCONFIG utilities with the -local option.

To check the status of the OLR:

As the root user, use the OCRCHECK utility, as shown in the following example:

[root]# ocrcheck -local

This command produces output similar to the following:

Status of Oracle Local Registry is as follows :
        Version                  :          3
        Total space (kbytes)     :     262132
        Used space (kbytes)      :       9200
        Available space (kbytes) :     252932
        ID                       :  604793089
        Device/File Name         : /u01/grid/cdata/node01.olr
                                    Device/File integrity check succeeded

        Local registry integrity check succeeded

        Logical corruption check succeeded

To view the contents of the OLR:

Use the OCRDUMP utility to display the contents of the OLR to the terminal window that initiated the program, as follows:

ocrdump -local -stdout

To export the OLR to a file:

As the root user, use the OCRCONFIG utility, as shown in the following example:

[root]# ocrconfig –local –export file_name

To import a specified file to the OLR:

As the root user, use the OCRCONFIG utility, as shown in the following example:

[root]# ocrconfig –local –import file_name

To change the location of the OLR file on the local node:

As the root user, use the OCRCONFIG utility to modify the location where the OLR file is stored on the local host:

$ ocrconfig –local –repair -replace current_olr_file_name
-replacement new_olr_file_name

Changing the Oracle Cluster Registry Configuration

This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances are running on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.

This section contains the following topics:

Note:

The operations in this section affect the OCR for the entire cluster. However, the ocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. Avoid shutting down nodes while modifying the OCR using the ocrconfig command.

Adding an OCR Location

Oracle Clusterware supports up to 5 OCR copies. You can add an OCR location after an upgrade or after completing the Oracle RAC installation. Additional OCR copies provide greater fault tolerance.

To add an OCR file:

As the root user, enter the following command to add a new OCR file:

[root]# ocrconfig -add new_ocr_file_name 

This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.

Replacing an OCR

If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online.

To change the location of an OCR or replace an OCR file:

  1. Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online, using the following command:

    ocrcheck 
    

    Note:

    The OCR that you are replacing can be either online or offline.
  2. Use the following command to verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation:

    crsctl check cluster -all
    
  3. As the root user, enter the following command to designate a new location for the specified OCR file:

    [root]# ocrconfig -replace source_ocr_file -replacement destination_ocr_file
    

    This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.

  4. Use the OCRCHECK utility to verify that OCR replacement file is online:

    ocrcheck 
    

Removing an OCR

To remove an OCR file, at least one copy of the OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).

To remove an OCR location from your Oracle RAC cluster:

  1. Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.

    ocrcheck
    

    Note:

    Do not perform this OCR removal procedure unless there is at least one active OCR online.
  2. As the root user, run the following command on any node in the cluster to remove a specific OCR file:

    [root]# ocrconfig -delete ocr_file_name
    

    This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.

Repairing an OCR Configuration on a Local Node

If one of the nodes in your cluster was not available when you modified the OCR configuration, then you might need to repair the OCR configuration on that node before it is restarted.

To repair an OCR configuration:

  1. As the root user, run the one or more of the following commands on the node on which Oracle Clusterware is stopped, depending on the number and type of changes that were made to the OCR configuration:

    [root]# ocrconfig –repair -add new_ocr_file_name
    
    [root]# ocrconfig –repair -delete ocr_file_name
    
    [root]# ocrconfig –repair -replace source_ocr_file -replacement dest_ocr_file
    

    These commands update the OCR configuration only on the node from which you run the command.

    Note:

    You cannot perform these operations on a node on which the Oracle Clusterware daemon is running.
  2. Restart Oracle Clusterware on the node you have just repaired.

  3. As the root user, check the OCR configuration integrity of your cluster using the following command:

    [root]# ocrcheck
    

Troubleshooting the Oracle Cluster Registry

This section includes the following topics about troubleshooting the Oracle Cluster Registry (OCR):

About the OCRCHECK Utility

The OCRCHECK utility displays the data block format version used by the OCR, the available space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file as well as a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output:

Status of Oracle Cluster Registry is as follows :
   Version                  :          3
   Total space (kbytes)     :     262144
   Used space (kbytes)      :      16256
   Available space (kbytes) :     245888
   ID                       :  570929253
   Device/File Name         : +CRS_DATA
                              Device/File integrity check succeeded
...
                              Decive/File not configured

   Cluster registry integrity check succeeded

   Logical corruption check succeeded

The OCRCHECK utility creates a log file in the following directory, where Grid_home is the location of the Oracle grid infrastructure installation, and hostname is the name of the local node:

Grid_home/log/hostname/client

The log files have names of the form ocrcheck_nnnnn.log, where nnnnn is the process ID of the operating session that issued the ocrcheck command.

Common Oracle Cluster Registry Problems and Solutions

Table 5-1 describes common OCR problems and their corresponding solutions.

Table 5-1 Common OCR Problems and Solutions

Problem Solution

The OCR is not mirrored.

Run the ocrconfig command with the -add option as described in the section "Adding an OCR Location".

A copy of the OCR has failed and you must replace it. Error messages are being reported in Enterprise Manager or the OCR log file.

Run the ocrconfig command with the -replace option as described in the section "Replacing an OCR".

The OCR configuration was updated incorrectly.

Run the ocrconfig command with the -repair option as described in the section "Repairing an OCR Configuration on a Local Node".

You are experiencing a severe performance effect from updating multiple OCR files, or you want to remove an OCR file for other reasons.

Run the ocrconfig command with the -delete option as described in the section "Removing an OCR".