Patching and Upgrades, RAC One Node, and Clusterware
For Release 2 only: Learn how to enable a single name for the cluster, enable High Availability for a single instance database, place OCR and voting disks on ASM, and explore some miscellaneous HA-related improvements.
See Series TOC In Oracle Database 11
g Release 2, there are three major changes in the installation process. First, Oracle Universal Installer’s familiar screens have been replaced by a new interface.
Note the list on the left hand side that shows the steps. The right side shows the details of the setup. This is a major shift from earlier releases that showed only a progress bar. The left side not only shows you which sections have been completed but also allows you to go back to a specific section at will to correct something or make a modification.
The second important difference you will notice is in the area of database patchsets. Traditionally, Oracle Database patchsets were just for bugfixes; at best, only minor new functionality was introduced. With this release of Oracle, that tradition is broken: The first patchset of Oracle Database 11g Release 2, 11.2.0.2, introduced some significant new functionality.
Finally, the third change also involves patchsets. Traditionally, the patchsets (e.g. Oracle 10.2.0.4 or 9.2.0.8) were released as patches; you had to apply the patches, typically a file named like p8202632_10205_LINUX.zip, to an existing installation. With this release, however, patchsets are now standalone: for example, the 11.2.0.2 patchset is a completely new installation -- it contains seven files, approximately 4GB in all, and they are all you need to create a 11.2 database.
This begs a logical question: what will be case when you upgrade from 11.2.0.1 to 11.2.0.2? You would install the 11.2.0.2 software in a completely different home, shutdown the main database, change ORACLE_HOME variable to the new Oracle Home, and start the upgrade process there.
This approach is very useful when patching Clusterware. You would install the Clusterware in a different Oracle Home when the old one is still up and running. Then you would just shut down the cluster services on the old home and bring it up on the new home. In a two-node RAC cluster, this can allow you to do a zero or near-zero downtime upgrade.
Upgrade Options
Speaking of upgrades, you can upgrade to Oracle Database 11g Release 2 from the following releases only. The source database must be at least in the following patchset levels.
- If in Oracle 9i Release 2, you should be at patchset 9.2.0.8 or higher
- If in Oracle Database 10g Release 1, you should be at patchset 10.1.0.5 or higher
- If in Oracle Database 10g Release 2, you should be at patchset 10.2.0.2 or higher
- If in Oracle Database 11g Release 1, you should be at patchset 11.1.0.6 or higher
If you are in an older database version or patchset, you should first upgrade to bring it up to the level desired before upgrading. For instance, if your database is at 10.1.0.4, you have to apply the patch 10.1.0.5 before attempting 11.2 upgrade. If you have an older version, e.g. Oracle8i, then you have perform a two-step upgrade: upgrade to Oracle 10.2.0.4 and then to 11.2.
Upgrade Tips
There are several issues that you should be aware of before upgrading to make your upgrade experience better. They are described clearly in the Upgrade Guide. Here are some tips. (Please note, this is by no means a comprehensive one. I just described the ones that stand out.)
Dictionary Statistics
Oracle Database 11.2 requires optimizer statistics on dictionary tables. If the source database didn’t have such stats, which would most likely be the case, the stats will be collected during the upgrade. Needless to say it may make the upgrade process extremely time lengthy. Since the database will be down for that duration, it may be highly undesirable.
To reduce that time you should compute statistics on dictionary objects prior to the upgrade. As SYS, issue the following SQL:
begin
dbms_stats.gather_dictionary_stats;
end;
The above does not require a downtime. In Oracle9i, the above procedure does not exist. So, you should execute dbms_stats.gather_schema_stats instead.
Invalid Objects
This tip applies to any upgrade, not necessarily for 11.2 alone. During the upgrade, the invalid objects will be compiled. Depending on the number of such objects, the time needed could be lengthy. Therefore it will be much better to compile all invalid objects prior to the upgrade. For the objects that can’t be compiled for dependency failures, you should drop them. You can take a backup of those objects using Data Pump Export.
CONNECT Role
Remember the role CONNECT? In the earlier releases the role had privileges in addition to just connection. It also had create table, create synonym, etc. In Oracle Database 11.2, this role has only one privilege: CREATE SESSION. Therefore the users who have only this role might not be able to do due to the missing privileges, e.g. if the user needed to create a table, now it will not be.
Therefore, you should identify the users with connect role and grant them the needed privileges. It can be done after the upgrade but preferably be done prior to the upgrade as well.
Raw Device Support
This is not an issue but I think I should mention it here to clear some confusion. Oracle Database 11.2 completely desupports raw devices. Read: desupported; not just deprecated. So does that mean you have to convert the datafiles on raw to either filesystem or ASM prior to conversion?
No; not at all. The desupport of raw devices in 11.2 means you will not be able to create new databases on raw; you must use ASM or filesystem. However when you upgrade an existing database on raw, it will be possible to continue on the raw devices. You will also be able to create new datafiles on the raw devices as well.
In the next major release, Oracle may completely desupport raw for upgrades as well as new databases. So, you should move away from raw as soon as possible.
Grid Infrastructure Home
One of the most important changes in this release of Oracle is the decoupling of ASM and RDBMS codes, as you learned in the installment on ASM. To recap, in the earlier releases of Oracle Database, ASM was not separate software but a part of the RDBMS code itself. You didn’t have to install ASM separately (but as a best practice, you would have created a separate home for ASM). In this release, ASM is no longer part of the RDBMS code. It is packaged into a new component called Grid Infrastructure (GI), which includes Clusterware as well. Even if you don’t use RAC (and therefore no cluster), you have to install GI if you plan to use ASM.
So, you need to perform two installations in two different homes: GI Home and RDBMS Home.
On the
downloads page, note the different software downloads available:
- Oracle Database 11g Release 2 Grid Infrastructure (11.2.0.1.0) for Linux x86
- Oracle Database 11g Release 2 (11.2.0.1.0) for Linux x86
You should download both of these items (if you plan on using ASM and even if you do not plan to use RAC). First, you need to install the Grid Infrastructure, which configures ASM and starts it. Next you install the Database software for the RDBMS, which will use the ASM diskgroups created earlier.
Providing ASM services is not the only function of Grid Infrastructure. The next section shows another important use.
For better security, you should use different userids for Database and GI. For instance, you can use oragrid for GI and oradb for the Database software. Doing so allows you to give control of the Grid Infrastructure to a group different from the DBAs, such as System Admins. You don’t have to separate the duties like that but setting up under different userids like that allows you to do so if desired in the future.
Oracle Restart
What happens when the PMON process of Oracle is killed? The instance dies, of course. The same effect applies to the death of any other critical process – SMON, LGWR and so on. Sometimes the issues are transient in nature and may be resolved by themselves. In such cases, if you restart the instance, it will come up well; but you must handle the process of detecting the failure and starting the process yourself.
Previously, in the case of Oracle RAC, the CRS took care of the detection and restarts. If you didn’t use RAC, then this was not an option for you. However, in this version of Oracle, you do have that ability even if you do not use RAC. The functionality – known as Oracle Restart – is available in Grid Infrastructure. An agent checks the availability of important components such as database, listener, ASM, etc. and brings them up automatically if they are down. The functionality is available out of the box and does not need additional programming beyond basic configuration. The component that checks the availability and restarts the failed components is called HAS (High Availability Service).
Here is how you check the availability of HAS itself (from the Grid Infrastructure home):
$ crsctl check has
CRS-4638: Oracle High Availability Services is online
The service has to be online to be effective. You can check the versions by the following commands:
$ crsctl query has releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.1.0]
$ crsctl query has softwareversion
Oracle High Availability Services version on the local node is [11.2.0.1.0]
The service can be set to start automatically. To check if it has been configured that way:
$ crsctl config has
CRS-4622: Oracle High Availability Services autostart is enabled.
If it is not set to autostart, it can be made so by:
$ crsctl enable has
CRS-4622: Oracle High Availability Services autostart is enabled.
Replacing "enable" by "disable" will remove the autostart property of HAS. If not started, it can be started by:
$ crsctl start has
Similarly "stop" clause stops the HAS. The Grid Infrastructure installation places an executable called ohasd in the autostart directory of the server, e.g. in /etc/init.d in case of Linux. This executable provides the HAS service, i.e. it is the HAS daemon. In Linux, you can confirm it by checking for the processes:
$ ps -aef|grep has
root 5185 1 0 Aug10 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
oracle 5297 1 0 Aug10 ? 00:05:24 /opt/oracle/product/11.2.0/grid/bin/ohasd.bin reboot
Once HAS is in place, you can register the components that should be monitored by HAS and be brought up when needed. These are done by a command srvctl under ORACLE_HOME/bin of Grid Infrastructure. You may be familiar with the command; it is used in RAC. In Oracle Database 11.2, it is used in single instance as well.
The first component you may want to add is the Oracle Database itself. Here is how you register the database:
$ srvctl add database -d D112D2 -o /opt/oracle/product/11.2.0/db1 -y AUTOMATIC -r PRIMARY -s open
The above command registers a database named D112D2 running from the Oracle Home /opt/oracle/product/11.2.0/db1. The "-y" option tells HAS to restart the database automatically in case of failure. The "-r" option mentions the role, which is Primary in this case (assuming a Data Guard configuration). The "-s" options tells HAS to bring the database all the way to open state when restarting.
Once configured, you can start the database using the command shown below:
$ srvctl start database -d D112D2
The command starts the components in proper order, i.e. ASM instance must start and diskgroups be mounted before the database can start. It occurs only with SRVCTL. If you start the database with SQL*Plus, these dependencies are not checked.
To check the status of the database:
$ srvctl status database -d D112D2
Database is running.
An option "-v" shows the services running in the database:
$ srvctl status database -d D112D2 -v
Database D112D2 is running with online services APP
To get details on the database you should use the "config" option
$ srvctl config database -d D112D2 -a
Database unique name: D112D2
Database name: D112D2
Oracle home: /opt/oracle/product/11.2.0/db1
Oracle user: oracle
Spfile: +DATA/D112D2/spfileD112D2.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA
Services: APP
Database is enabled
Similarly, you can configure the listener by the command shown below:
$ srvctl config listener -l LISTENER1
Name: LISTENER1
Home: /opt/oracle/product/11.2.0/grid
End points: TCP:1522
Once configured, the listener is created; Check the status of the listener:
$ srvctl status listener -l LISTENER1
Listener LISTENER1 is enabled
Listener LISTENER1 is not running
The listener is not running. It’s merely created; not started. To start:
$ srvctl start listener -l LISTENER1
You can check the configuration of ASM in the same manner.
$ srvctl config asm -a
ASM home: /opt/oracle/product/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asm/asmparameterfile/registry.253.720043719
ASM diskgroup discovery string: /dev/sdb*
ASM is enabled.
To check the status of ASM:
$ srvctl status asm -a
ASM is running on oradba2
ASM is enabled.
The configuration can be done for Diskgroups and Service as well. Here is how we can create a service called APP (remember, we are doing this in a single instance, not RAC where these commands have been long used).
$ srvctl add service -d D112D2 -s APP
After the creation, you can check the various properties of the service:
$ srvctl config service -d D112D2 -a
Service name: APP
Service is enabled
Cardinality: SINGLETON
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
The service has been created; but not started yet. You can check the status of the service and start it:
$ srvctl status service -d D112D2 -s APP
Service APP is not running.
$ srvctl start service -d D112D2 -s APP
$ srvctl status service -d D112D2 -s APP
Service APP is running
You can also use the same command for diskgroups in ASM, which are also considered resources, as well.
$ srvctl start diskgroup -g <dg_name>
$ srvctl stop diskgroup -g <dg_name> [-f]
(The "-f" option is to force stop the diskgroup, even if some database is using that diskgroup and is currently up)
$ srvctl status diskgroup -g <dg_name> [-a]
(The "-a" option is for showing all status)
$ srvctl enable diskgroup -g <dg_name>
$ srvctl disable diskgroup -g <dg_name>
To modify the configuration you use the "modify" clause:
$ srvctl modify database -d D112D2 –z
In this example we have configured to remove the dependence of disk groups on databases (the –z option does it)
To remove the database from Oracle Restart, you can use
$ srvctl remove database -d D112D2 -v
Remove the database D112D2? (y/[n]) y
Successfully removed database and its dependent services.
This merely removes the database from HAS services. The database itself is not dropped.
You can get a lot of information on the resources managed by HAS when they are running. Here is the command to do that. The "-v" option, which stands for "verbose" output, makes it happen.
$ crsctl status resource -v
NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
LAST_SERVER=oradba2
STATE=ONLINE on oradba2
TARGET=ONLINE
CARDINALITY_ID=ONLINE
CREATION_SEED=6
CURRENT_RCOUNT=0
FAILURE_COUNT=0
FAILURE_HISTORY=
ID=ora.DATA.dg oradba2 1
INCARNATION=1
LAST_RESTART=1281465040
STATE_DETAILS=
… output truncated …
NAME=ora.diskmon
TYPE=ora.diskmon.type
LAST_SERVER=oradba2
STATE=ONLINE on oradba2
TARGET=ONLINE
CARDINALITY_ID=1
CREATION_SEED=1
CURRENT_RCOUNT=0
FAILURE_COUNT=0
FAILURE_HISTORY=
ID=ora.diskmon 1 1
INCARNATION=1
LAST_RESTART=1281462735
STATE_DETAILS=
While there is some detailed information, it’s a little difficult to read. You can get a much more succinct report by the following command:
$ crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg ONLINE ONLINE oradba2
ora.DG1.dg ONLINE ONLINE oradba2
ora.DG2.dg OFFLINE OFFLINE oradba2
ora.LISTENER.lsnr ONLINE ONLINE oradba2
ora.asm ONLINE ONLINE oradba2 Started
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd 1 ONLINE ONLINE oradba2
ora.d112d2.db ONLINE ONLINE oradba2 Open
ora.diskmon ONLINE ONLINE oradba2
What if you want to get a detailed output on a specific resource only? You can do that by naming the resource as a parameter. (Please note: you have to give the complete resource name as it appears in the crsctl output. For instance, the resource APP is not the full name. The full name is ora.d112d2.app.svc.)
$ crsctl status resource ora.d112d2.app.svc
NAME=ora.d112d2.app.svc
TYPE=ora.service.type
TARGET=ONLINE
STATE=ONLINE on oradba2
The "-v" option makes the output more verbose.
$ crsctl status resource ora.d112d2.app.svc -v
NAME=ora.d112d2.app.svc
TYPE=ora.service.type
LAST_SERVER=oradba2
STATE=ONLINE on oradba2
TARGET=ONLINE
CARDINALITY_ID=1
CREATION_SEED=17
CURRENT_RCOUNT=0
FAILURE_COUNT=0
FAILURE_HISTORY=
ID=ora.d112d2.app.svc 1 1
INCARNATION=1
LAST_RESTART=1293305825
STATE_DETAILS=
You can start all the resources by the following command
$ crsctl start resource -all
CRS-5702: Resource 'ora.DG1.dg' is already running on 'oradba2'
CRS-5702: Resource 'ora.LISTENER.lsnr' is already running on 'oradba2'
CRS-5702: Resource 'ora.asm' is already running on 'oradba2'
CRS-5702: Resource 'ora.cssd' is already running on 'oradba2'
CRS-5702: Resource 'ora.d112d2.app.svc' is already running on 'oradba2'
CRS-5702: Resource 'ora.d112d2.db' is already running on 'oradba2'
CRS-5702: Resource 'ora.diskmon' is already running on 'oradba2'
CRS-2672: Attempting to start 'ora.DG2.dg' on 'oradba2'
CRS-2676: Start of 'ora.DG2.dg' on 'oradba2' succeeded
CRS-4000: Command Start failed, or completed with errors.
This is a useful command to start all the resource at once but if one command fails, it shows the "completed with errors" output at the end. If the autostart it enabled for all the components, this is not really necessary. In some cases, particularly when you are trying to diagnose issues in the environment, you may want to disable autostart and start them manually.
To stop all resources running from a specific Oracle Home, you can issue a single command:
$ srvctl stop home -o /opt/oracle/product/11.2.0/db1 -s mystate.txt
The file mystate.txt is called a "state file" which records the state of the HAS components as they were at the time of the shutdown. Here is how the file looks like:
$ cat mystate.txt
svc-APP
db-d112d2
Similarly to start all resources started from a specific Oracle Home, you can issue a single command:
$ srvctl start home -o /opt/oracle/product/11.2.0/db1 -s mystate.txt
The state file is needed to start the resources running at the time of the shutdown.
RAC One Node
Are you familiar with the concept of Active/Passive Clusters such as Veritas Storage Foundation, HP ServiceGuard and IBM HACMP? If not, in summary, they watch for failures on the "active" server where the database runs. When that server fails, the cluster mounts the disks on the passive server and starts the database instance. It also moves the IP address (known as Virtual IP) to the passive server, so the clients do not actually have to change anything. The cluster experiences a brief hiccup but continues performing. (Despite the name, these technologies are not "clusters" strictly speaking, at least not as Oracle defines them.)
RAC One Node is Oracle’s answer to this category. In this configuration, the Oracle Clusterware creates a cluster that makes storage visible across multiple nodes via ASM. The concept can be extended to create a special type of RAC database that has only one instance running on one node. If the node fails, another instance can be brought up on a different node in the cluster to continue processing. In some ways, RAC One Node is a full RAC database but since only one instance is up, there is no cross-instance communication and therefore you may not suffer from the resultant performance issues.
But doesn't that mean that the other "node", being passive, is a wasted resource? Sure it is – for that database. But you can create the instance for a different database on that passive server, which makes it the active server for that database. By using a number of databases in that manner, you can create a semblance of load balancing while making all of them more available. The group of servers used for the active/passive nodes for the databases is known as a "server pool".
Oracle Clusterware has specific configurations for creating RAC One Node databases. Here is a example to create a database known as RAC1.
$ srvctl add database -c RACONENODE -e "dantstpdb01,dantstpdb02" -i RAC12 -d RAC1 -o /Proltst/oradb/db1
The "-c" option specifies the database type, which in this case is a RAC One Node. The server pool comprises two servers - dantstpdb01,dantstpdb02 – specified by "-e" option. The instance name is denoted by the "-I" option and the Oracle Home is specified by "-o".
When you want to move the database to a different node, you would issue:
$ srvctl relocate database -d RAC1 -n dantstpdb01
If you want to convert a regular database to RAC One Node database, you should first convert it to a RAC database on only one node. Then use the convert command:
$ srvctl convert database -d RAC1 -c RACONENODE
If you change your mind, you can convert it back to a regular RAC database on only one node:
$ srvctl convert database -d RAC1 -c RAC
If you are scratching your head about the utility of RAC One Node, and wondering why regular Oracle RAC wouldn't provide the same benefits, you should know that there are some differences between them.
- RAC One Node has a lower price point
- RAC One Node does not have the full complexity of RAC, and can be supported by less experienced DBAs
- RAC One Node has its own set of commands, which may insulate some of the nuances of RAC administration from DBAs
- Being on one node alone, it does not incur the performance overhead of high-frequency internode communication
However, RAC One Node is hardly a panacea. It is definitely not a DR solution, for which Data Guard is a much better alternative. Since only one node of the cluster is used, load balancing will not be possible for the same database.
SCAN Listeners
Before starting on this feature, let me touch upon a potential issue with RAC configuration. Suppose you have two nodes in your RAC cluster, with VIP host names as node1-vip and node2-vip. The clients will need to load balance between the two nodes. To do so, you may have created a TNSNAMES.ORA file entry as shown below:
APP=
(DESCRIPTION=
(FAILOVER=on)
(LOAD_BALANCE=on)
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(HOST=node1-vip)
(PORT=1521)
)
(ADDRESS=
(PROTOCOL=TCP)
(HOST=node2-vip)
(PORT=1521)
)
)
(CONNECT_DATA=
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=120)
)
(SERVER=dedicated)
(SERVICE_NAME=APP)
)
)
Suppose you decide to add in a third node into the cluster, node3-vip. Even if you define the service APP on that third node, the client will not be able to connect to the node simply because the node3-vip does not actually get mentioned in the TNS entry. What are your options at this stage?
The only option is to update the TNS entry at the client to add in the third node information. In the case of a handful of clients this may not be a huge deal but consider a scenario where you have thousands of clients.
The solution might be to introduce Oracle Internet Directory or something similar. Some organizations go to the extent of storing the TNSNAMES.ORA at a single location on an NFS mountpoint. All these options are difficult, if not impossible to execute, especially in large organizations.
To address this issue, 11g Release 2 has a concept called Single Client Access Name (SCAN). It’s just one host name, which is, say, app-scan. The listeners in the cluster nodes listen to this hostname. This hostname is mapped to three IP addresses (could be one also; but preferably to three to avoid a single point of failure). After you define it in your DNS, the name resolves to three IP addresses as shown below:
C:\>nslookup dantstp-scan
Server: stcadcpdc04.proligence.com
Address: 10.20.223.108
Name: dantstp-scan.proligence.com
Addresses: 192.168.76.62, 192.168.76.63, 192.168.76.64
When you ping the name, it should resolve to one of the three IPs in a round robin manner. Each node then runs a special type of listener called SCAN Listener. Here are the processes:
# ps -aef|grep lsnr
oragrid 9437224 1 0 Dec 23 - 0:04 /Proltst/oragrid/grid1/bin/tnslsnr LISTENER_SCAN1 -inherit
oragrid 32374934 55902452 0 21:15:04 pts/1 0:00 grep lsnr
oragrid 64356462 1 0 Dec 23 - 0:05 /Proltst/oragrid/grid1/bin/tnslsnr LISTENER –inherit
To check the listener status, you can use the well known lsnrctl utility:
# lsnrctl status LISTENER_SCAN1
LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production on 26-DEC-2010 21:15:42
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
Start Date 23-DEC-2010 14:37:10
Uptime 3 days 6 hr. 38 min. 31 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /Proltst/oragrid/grid1/network/admin/listener.ora
Listener Log File /Proltst/oragrid/grid1/log/diag/tnslsnr/dantstpdb01/listener_scan1/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.76.62)(PORT=1521)))
Services Summary...
Service "PROLADMIN" has 1 instance(s).
Instance "PROLTST11", status READY, has 1 handler(s) for this service...
Service "PROLTST1" has 1 instance(s).
Instance "PROLTST11", status READY, has 1 handler(s) for this service...
Service "PROLTST1XDB" has 1 instance(s).
Instance "PROLTST11", status READY, has 1 handler(s) for this service...
Service "Proltst_t1_preconnect" has 1 instance(s).
Instance "PROLTST11", status READY, has 1 handler(s) for this service...
Service "Proltst_t2_preconnect" has 1 instance(s).
Instance "PROLTST11", status READY, has 1 handler(s) for this service...
The command completed successfully
With this in place, check for the SCAN Listener:
# srvctl config scan
SCAN name: dantstp-scan, Network: 1/192.168.76.0/255.255.255.0/en0
SCAN VIP name: scan1, IP: /dantstp-scan/192.168.76.62
SCAN VIP name: scan2, IP: /dantstp-scan/192.168.76.63
SCAN VIP name: scan3, IP: /dantstp-scan/192.168.76.64
The three SCANs run on both the nodes – two one one and the third on the other. You can confirm that:
dantstpdb01.oragrid:/home/oragrid # srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node dantstpdb01
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node dantstpdb02
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node dantstpdb02
After that the TNSNAMES.ORA will look like this:
APP=
(DESCRIPTION=
(FAILOVER=on)
(LOAD_BALANCE=on)
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(HOST=dantstp-scan)
(PORT=1521)
)
)
(CONNECT_DATA=
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=120)
)
(SERVER=dedicated)
(SERVICE_NAME=APP)
)
)
Note there is only one hostname (shown in bold); not two as before. The hostname is for the SCAN name. When clients call it, it resolves, in round-robin format to one of the three IP addresses as defined in the DNS for that entry. The corresponding listener receives the request and forwards to one of the database instances. When you add a new node to the cluster, there is no need to change the TNSNAMES.ORA file since the SCAN will not change. One of the three listeners will pick it up and direct it to the least loaded node. Similarly you can remove a node from the cluster without making a single change to the TNSNAMES.ORA file. SCAN allows you to do that.
When you install the Grid Infrastructure, the SCAN is configured then. If you haven’t done that, don’t despair. You can add it later using this command:
$ srvctl add scan -n dantstpdb01
To add the SCAN listener, you can use
$ srvctl add scan_listener -l PROL –s –p 1521
The "-s" option tells it not to restrict any ports. The "-l" option is the prefix. The prefix is applied to SCAN listener name.
OCR and Voting on ASM
In the prior releases of Oracle, the Oracle Cluster Repository (OCR) and Voting Disks were on raw devices. Since the raw devices have been deprecated, the choice now is between a cluster filesystem or an ASM diskgroup. The OCR and voting must be on a shared device so a local filesystem is not going to work. Clustered filesystems may not be an option due to high cost. Other "free" options may include network filesystems but they are usually slow and unreliable. So, ASM remains the best choice. The OCR and voting disks could be on any available ASM diskgroup; not ones exclusively created for them.
But remember, in a clustered system, the cluster must come up before the ASM instance can start in a clustered mode on all instances. For the cluster to come up, it needs the OCR and voting disks. It’s a classic chicken and egg riddle – which one should first?
In Oracle Database 11g Release 2, the solution lies in making a special ASM call. When the OCR and Voting disks are placed in a diskgroup, the Clusterware places them in a special fixed location on the underlying disks. When the cluster starts, it checks that location to read the cluster repository and voting files. There is no reason to bring up the entire ASM stack to read the files. The OCR is also replicated across all the underlying disks of the diskgroup; so failure of a diskgroup does not bring the failure of the diskgroup.
Let’s see how it works. First check the voting devices. The following command shows the voting disks.
# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 50afef59821b4f5ebfd5b7bd17321d8b (/dev/emcdiskpath21) [PROLTSTCRS1]
2. ONLINE 59216f51595d4fa9bf3aa71c502843b9 (/dev/emcdiskpath22) [PROLTSTCRS1]
3. ONLINE 60356b15ab2b4fd7bfad1b5c79f1444c (/dev/emcdiskpath23) [PROLTSTCRS1]
4. ONLINE 8fbd5815460b4fc4bf23fd96e01c3a72 (/dev/emcdiskpath24) [PROLTSTCRS1]
5. ONLINE 95392036391f4fb7bf90c8cfa09b203e (/dev/emcdiskpath25) [PROLTSTCRS1]
Located 5 voting disk(s).
Notice that there are five voting disks, all in the diskgroup PROLTSTCRS1. The reason for five votedisks is that there are five disks in the diskgroup. Each disk gets a voting disk. In case of a failure, the nodes try to grab the voting disks and the node with the highest number of disks wins and becomes the master.
Checking for the OCR files:
# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3444
Available space (kbytes) : 258676
ID : 984202808
Device/File Name : +PROLTSTCRS1
Device/File integrity check succeeded
Device/File Name : +PROLTSTCRS2
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
Note the OCR file is on the diskgroup ISAQACRS1. Its mirror is in ISAQACRS2 diskgroup. Oracle writes the OCR and voting devices to the underlying disks at pre-designated locations so it is not difficult to get the contents of these files when the cluster starts up.
Let’s see that with an actual example. Let’s see the logs from CSS and CRS. They are located at $ORACLE_HOME/log/<Hostname>/cssd and ORACLE_HOME/log/<Hostname>/crsd respectively. Here is an excerpt from one of the logs. The line says that it found a "potential" voting file on one of the disks - /dev/emcdiskpath25.
2010-12-19 17:46:36.517: [ CSSD][1286]clssnmvDiskVerify: discovered a potential voting file
2010-12-19 17:46:36.519: [ SKGFD][1286]Handle 1115de4f0 from lib :UFS:: for disk :/dev/emcdiskpath25:
2010-12-19 17:46:36.519: [ CSSD][1286]clssnmvDiskVerify: Successful discovery for disk /dev/emcdiskpath25,
UID 95392036-391f4fb7-bf90c8cf-a09b203e, Pending CIN 0:0:1, Committed CIN 0:0:1
2010-12-19 17:46:36.520: [ SKGFD][1286]Lib :UFS:: closing handle 1115de4f0 for disk :/dev/emcdiskpath25:
2010-12-19 17:46:36.520: [ CSSD][1286]clssnmvDiskVerify: discovered a potential voting file
2010-12-19 17:46:36.522: [ SKGFD][1286]Handle 1115de4f0 from lib :UFS:: for disk :/dev/emcdiskpath24:
2010-12-19 17:46:36.524: [ CSSD][1286]clssnmvDiskVerify: Successful discovery for disk /dev/emcdiskpath24,
UID 8fbd5815-460b4fc4-bf23fd96-e01c3a72, Pending CIN 0:0:1, Committed CIN 0:0:1
2010-12-19 17:46:36.525: [ SKGFD][1286]Lib :UFS:: closing handle 1115de4f0 for disk :/dev/emcdiskpath24:
2010-12-19 17:46:36.525: [ CSSD][1286]clssnmvDiskVerify: discovered a potential voting file
2010-12-19 17:46:36.526: [ SKGFD][1286]Handle 1115de4f0 from lib :UFS:: for disk :/dev/emcdiskpath23:
2010-12-19 17:46:36.528: [ CSSD][1286]clssnmvDiskVerify: Successful discovery for disk /dev/emcdiskpath23,
UID 60356b15-ab2b4fd7-bfad1b5c-79f1444c, Pending CIN 0:0:1, Committed CIN 0:0:1
2010-12-19 17:46:36.529: [ SKGFD][1286]Lib :UFS:: closing handle 1115de4f0 for disk :/dev/emcdiskpath23:
2010-12-19 17:46:36.529: [ CSSD][1286]clssnmvDiskVerify: discovered a potential voting file
2010-12-19 17:46:36.530: [ SKGFD][1286]Handle 1115de4f0 from lib :UFS:: for disk :/dev/emcdiskpath22:
2010-12-19 17:46:36.532: [ CSSD][1286]clssnmvDiskVerify: Successful discovery for disk /dev/emcdiskpath22,
UID 59216f51-595d4fa9-bf3aa71c-502843b9, Pending CIN 0:0:1, Committed CIN 0:0:1
2010-12-19 17:46:36.533: [ SKGFD][1286]Lib :UFS:: closing handle 1115de4f0 for disk :/dev/emcdiskpath22:
2010-12-19 17:46:36.533: [ CSSD][1286]clssnmvDiskVerify: discovered a potential voting file
2010-12-19 17:46:36.534: [ SKGFD][1286]Handle 1115de4f0 from lib :UFS:: for disk :/dev/emcdiskpath21:
2010-12-19 17:46:36.536: [ CSSD][1286]clssnmvDiskVerify: Successful discovery for disk /dev/emcdiskpath21,
UID 50afef59-821b4f5e-bfd5b7bd-17321d8b, Pending CIN 0:0:1, Committed CIN 0:0:1
2010-12-19 17:46:36.537: [ SKGFD][1286]Lib :UFS:: closing handle 1115de4f0 for disk :/dev/emcdiskpath21:
2010-12-19 17:46:36.537: [ CSSD][1286]clssnmvDiskVerify: Successful discovery of 5 disks
2010-12-19 17:46:36.537: [ CSSD][1286]clssnmCompleteRmtDiscoveryReq: Completing voting file discovery requested by node dantstpdb01, number 1
2010-12-19 17:46:36.537: [ CSSD][1286]clssnmSendDiscoverAck: Discovery complete, notifying requestor node dantstpdb01
Note that the voting files were properly discovered and enumerated with labels, known as File Universal ID. For instrance FUID of voting file 5 is 95392036-391f4fb7-bf90c8cf-a09b203e. If you notice carefully, that is what the crsctl query css votedisk command also displays.
The placement of voting disks on ASM diskgroup is not cast in stone. To change the location of the voting disks from one diskgroup to another, use the following command:
# crsctl replace votedisk +NEWVOTEDISKDG
Another option is to add new voting files and delete the old ones. First get the FUID:
# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 50afef59821b4f5ebfd5b7bd17321d8b (/dev/emcdiskpath21) [PROLTSTCRS1]
2. ONLINE 59216f51595d4fa9bf3aa71c502843b9 (/dev/emcdiskpath22) [PROLTSTCRS1]
3. ONLINE 60356b15ab2b4fd7bfad1b5c79f1444c (/dev/emcdiskpath23) [PROLTSTCRS1]
4. ONLINE 8fbd5815460b4fc4bf23fd96e01c3a72 (/dev/emcdiskpath24) [PROLTSTCRS1]
5. ONLINE 95392036391f4fb7bf90c8cfa09b203e (/dev/emcdiskpath25) [PROLTSTCRS1]
Located 5 voting disk(s).
Delete a voting file using its FUID:
$ crsctl delete css votedisk 95392036391f4fb7bf90c8cfa09b203e
Add a new voting file on a different diskgroup.
$ crsctl add css votedisk +NEWVOTEDISKDG
The same technique can be applied to add a new OCR file:
# ocrconfig -add +PROLTSTDATA1
And you can remove the old one.
# ocrconfig -delete +PROLTSTCRS1
This also works when you upgrade a clusterware from pre-11.2 to 11.2 and the OCR and voting devices were on raw devices. After the upgrade, you can migrate them to ASM using the above mentioned commands.
Incidentally, did you notice that there are places for five copies of the OCR file? In this release you can create up to five copies on five different diskgroups to make them redundant. This is a shift from the earlier releases of only two copies (one primary and one mirror).
DST Patch
Daylight Savings Time patches are mandatory to apply but seldom pleasant. In this release of Oracle, the DST patches are completely online; instance recycle is not required.
Multiple Interconnect
This is a exciting reinstatement of a very popular feature: If you worked in Oracle Parallel Server, there used to be a facility (since de-supported) to specify more than one interface for the cluster interconnect. In the prior versions of Clusterware, you could not specify more than one interface for the private interconnect. If you actually had multiple physical interfaces, such as eth1, eth2 and so on, your only options were to use only one of them, or bind them to create one interface.
The first option is not a good one; it creates a single point of failure and limits the bandwidth. Option 2 is always preferred but it requires the use of OS-specific tools to provide that binding such as APA in HPUX, Truncing in Solaris, MultiNIC in Linux and so on.
In Oracle Database 11g Release 2 it is possible to specify multiple NICs. However, the Clusterware will only use up to four interfaces; not all. It will load balance across all four. It still makes sense to define more than four interfaces. When one of the interfaces fail, the Clusterware will move to one of the available (and not yet used) interface.
Cluster Time Service
Timekeeping, it may seem, is the least of worries in a Clusterware. It definitely is not. A cluster is composed of multiple machines each with its own timekeeping mechanism. It is possible that the clock runs faster or slower (or altogether absent) on one server. This discrepancy in time may lead to all types of issues – starting with the relatively benign difficulty in syncing up cluster logs during troubleshooting, to the very damaging issue of cluster node eviction.
To avoid this issue, most systems have time synchronization service configured – called Network Time Protocol (NTP). If the NTP is not configured, there could be potential issues mentioned above. Therefore, this release of Oracle Clusterware installs a time syncing mechanism – Cluster Time Synchronization Service, or CTSS. CTSS always runs but in an "Observer" mode. Here how you check the presence of the service:
# crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
If the time difference appears between nodes, CTSS takes over and brings it up as the default time synchronizing service, or turns to "Active" mode. You can check for the transformation using the above command. The event is also recorded in alert log.
Enhanced Cluster Verification Utility
In this release, ECVU has been considerably enhanced with different checks as well as integrated with the Oracle installer. It also runs automatically and periodically by the Clusterware. To check for the run, use the following command:
# srvctl config cvu
CVU is configured to run once every 360 minutes
If you want to check whether it is running and where:
# srvctl status cvu
CVU is enabled and running on node dantstpdb02
You can modify the interval. To make it run every 10 mins instead of 6 hours:
# srvctl modify cvu –t 10
There are several new options for CVU. Here is an example of the ACFS check
$ cluvfy stage -pre acfscfg -n dantstpdb01,dantstpdb02 -asmdev /dev/rhdiskpwer*
Performing pre-checks for ACFS Configuration
Checking node reachability...
… output truncated …
Task ACFS Drivers check started...
Task ACFS Drivers check passed
Pre-check for ACFS Configuration was successful.
Here is another example to check for SCAN component setup (described earlier in this installment)
# cluvfy comp scan
Verifying scan
Checking Single Client Access Name (SCAN)...
Checking TCP connectivity to SCAN Listeners...
TCP connectivity to SCAN Listeners exists on all cluster nodes
Checking name resolution setup for "dantstp-scan"...
Verification of SCAN VIP and Listener setup passed
Verification of scan was successful.
Rather than explaining every possible command in CVU, which will fill several pages, I will direct you to the
relevant doc. It is worth investing a little time in understanding the various options.