Exadata Storage expansion

I got a chance to explore and involve in this DBMA Task. In this article, I will summarize and walk through a procedure about adding a new cell to an existing Exadata Database Machine.

Most of us knew the capabilities that Exadata Database Machine delivers. It’s a known fact that Exadata comes in different fixed rack size capacity:

    • 1/8 rack (2 db nodes, 3 cells),
    • quarter rack (2 db nodes, 3 cells),
    • half rack (4 db nodes, 7 cells) and
    • full rack (8 db nodes, 14 cells). 

When you want to expand the capacity, it must be in fixed size as well, like, 1/8 to quarter, quarter to
half and half to full.

 

With Exadata X5 Elastic configuration, one can also have customized sizing by extending capacity of the rack
by adding any number of DB servers or storage servers or combination of both, up to the maximum allowed capacity
in the rack.

Preparing to Extend Exadata Database Machine

Preparing to Extend Exadata Database Machine ◄===

[0] Validate the environment
Before starting the activity, collect the Exachk, and validate the environment.
Also, verify the current cell alert if any.

dcli -g /root/cell_group -l root "cellcli -e list alerthistory where endTime=null and alertShortName=Hardware and alertType=stateful and severity=critical"

[1] Ensure HW placed in the rack, and all necessary network and cabling requirements are completed.
(2 IPs from the management network is required for the new cell).

[2] Re-image or upgrade of cell image

2.1 Extract the imageinfo from one of the existing cell server.
2.2 Login to the new cell through ILOM, connect to the console as root user and get the imageinfo
2.3 If the image version on the new cell doesn’t match with the existing image version, either you
download the exact image version and re-image the new cell or upgrade the image on the existing servers.

Review “MOS Doc ID 2151671.1” if you want to reimage the new cell.

[3] Add the IP addresses acquired for the new cell to the /etc/oracle/cell/network-config/cellip.ora file on each DB node.

To do this, perform the steps below from the first 1 dB server in the cluster:

cd /etc/oracle/cell/network-config
cp cellip.ora cellip.ora.orig
cp cellip.ora cellip.ora-bak

[4] If ASR alerting was set up on the existing storage cells, configure cell ASR alerting for the cell being added

List the cell attributes required for configuring cell ASR alerting.
Run the following command from any existing storage grid cell:

CellCLI> list cell attributes snmpsubscriber

Apply the same SNMP values to the new cell by running the command below as the celladmin user,
as shown in the below example:

CellCLI> alter cell snmpSubscriber=((host='10.20.14.21',port=162,community=public))

[5] Configure cell alerting for the cell being added.

List the cell attributes required for configuring cell alerting.
Run the following command from any existing storage grid cell:

CellCLI> list cell attributes notificationMethod,notificationPolicy,
smtpToAddr,smtpFrom,smtpFromAddr,smtpServer,smtpUseSSL,smtpPort

Apply the same values to the new cell by running the command below as the celladmin user,
as shown in the example below:

CellCLI> alter cell notificationmethod='mail,snmp',notificationpolicy='critical,warning,clear',
smtptoaddr= 'dba@email.com',smtpfrom='Exadata',smtpfromaddr='dba@email.com',smtpserver='10.20.14.21',
smtpusessl=FALSE,smtpport=25

[6] Create cell disks on the cell being added

Log in to the cell as celladmin and run the following command:

CellCLI> create celldisk all

[7] Check that the flash log was created by default:

CellCLI> list flashlog

You should see the name of the flash log. It should look like cellnodename_FLASHLOG, and its status should be “normal”.If the flash log does not exist, create it using :

CellCLI> create flashlog all

[8] Check the current flash cache mode and compare it to the flash cache mode on existing cells:

CellCLI> list cell attributes flashcachemode

To change the flash cache mode to match the flash cache mode of existing cells, do the following:

1. If the flash cache exists and the cell is in WriteBack flash cache mode,
you must first flush the flash cache:

CellCLI> alter flashcache all flush

Wait for the command to return.

2. Drop the flash cache:

CellCLI> "drop flashcache all"

3. Change the flash cache mode:

CellCLI> alter cell flashCacheMode=writeback

The value of the flashCacheMode attribute is either writeback or writethrough.
The value must match the flash cache mode of the other storage cells in the cluster.

4. Create the flash cache:

CellCLI> create flashcache all

[9] Create grid disks on the cell being added.

—> Query the size and cachingpolicy of the existing grid disks from an existing cell.

CellCLI> list griddisk attributes name,asmDiskGroupName,cachingpolicy,size,offset
  • For each disk group found by the above command, create grid disks on the new cell that is being added to the cluster.
  • Match the size and the cachingpolicy of the existing grid disks for the disk group reported by the command above.
  • Grid disks should be created in the order of increasing offset to ensure similar layout and performance characteristics as the existing cells.
  • For example, the “list griddisk” command could return something like
    this:
DATAC1 default 5.6953125T 32M
DBFS_DG default 33.796875G 7.1192474365234375T
RECOC1 none 1.42388916015625T 5.6953582763671875T

When creating grid disks, begin with DATAC1, then RECOC1, and finally DBFS_DG using the following command:

CellCLI> create griddisk ALL HARDDISK PREFIX=DATAC1, size=5.6953125T, cachingpolicy='default',
comment="Cluster cluster-clux6 DR diskgroup DATAC1"

CellCLI> create griddisk ALL HARDDISK PREFIX=RECOC1,size=1.42388916015625T, cachingpolicy='none',
comment="Cluster cluster-clux6 DR diskgroup RECOC1"

CellCLI> create griddisk ALL HARDDISK PREFIX=DBFS_DG,size=33.796875G, cachingpolicy='default',
comment="Cluster cluster-clux6 DR diskgroup DBFS_DG"

CAUTION: Be sure to specify the EXACT size shown along with the unit (either T or G).

[10] Verify the newly created grid disks are visible from the Oracle RAC nodes.
Log in to each Oracle RAC node and run the following command:

$GI_HOME/bin/kfod op=disks disks=all | grep cellName_being_added

This should list all the grid disks created as above.

[11] Add the newly created grid disks to the respective existing ASM disk groups.

ALTER DISKGROUP disk_group_nameadd disk 'comma_separated_disk_names';

The command above kicks off an ASM rebalance at the default power level.
Monitor the progress of the rebalance by querying gv$asm_operation :

SQL> select * from gv$asm_operation;

Once the rebalance completes, the addition of the cell to the Oracle RAC is complete.

[12] Run the latest Exachk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata.

Thank you Oracle ACE Syed Jaffar Hussain for sharing his experience

Thank you for visiting this blog 🙂

Manually take an ILOM snapshot

DBMA has to collect the ILOM snapshot as per the request from oracle support, As many of you might be asked by Oracle support to provide ILOM snapshot to troubleshoot Exadata Hardware issues.

I had to diagnose a hardware issue recently and was not able to use web interface because for firewall issue. Fortunately, you can generate ILOM snapshot using following CLI method.

[1] let’s connect and set the snapshot type to normal

Step 1 : Login to ILOM using root user.

[root@myclusterdb01 ~]# ssh myclustercel05-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 3.2.7.30.a r112904
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: myclustercel05-ilom

Step 2 : Set snapshot dataset to normal.

-> set /SP/diag/snapshot dataset=normal
Set 'dataset' to 'normal'

Step 3 : Set snapshot output location.

-> set /SP/diag/snapshot dump_uri=sftp://root:"passowrd!"@10.21.101.22/tmp’

Set 'dump_uri' to 'sftp://root:"passowrd!"@10.21.101.22/tmp’

Step 4 : Change directory to snapshot

-> cd /SP/diag/snapshot
/SP/diag/snapshot

Step 5 : Check Status of snapshot , make sure its running

-> show
/SP/diag/snapshot

Targets:
Properties:
dataset = normal
dump_uri = (Cannot show property)
encrypt_output = false
result = Running

Step 6: Keep checking status till it’s completed. May take up to 10 mins

-> show
/SP/diag/snapshot
Targets:

Properties:
dataset = normal
dump_uri = (Cannot show property)
encrypt_output = false
result = Collecting data into

sftp://oracle@10.21.101.22/etc/snapshot/exa01dbadm01-ilom_XXXX30AG_2018-09-14T23-04-46.zip

TIMEOUT: /usr/local/bin/spshexec show /SP/bootlist
TIMEOUT: /usr/local/bin/create_ueficfg_xml

Snapshot Complete.
Done.

Step 7: Upload files to Oracle support.

oracle@10.21.101.22/tmp/exa01dbadm01-ilom_XXXX30AG_2018-09-14T23-04-46.zip

[2] let’s connect and set the snapshot type to full :

A full ILOM snapshot (which is the one Oracle support will most likely ask you) may (yes, “may”) reset the host as per the documentation :
Note – Using this option might reset the host operating system.
“Reset the host” meaning rebooting the host.

Fred mentioned in his blog that he did it few times on production cells and they have never been rebooted but this is something to keep in mind if you are asked to take a full ILOM snapshot of a database server. Indeed, a cell reboot would be transparent but this is a different story with a database server.

[root@myclusterdb01 ~]# ssh myclustercel05-ilom
Password:
Oracle(R) Integrated Lights Out Manager
Version 3.2.7.30.a r112904
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: myclustercel05-ilom
-> set /SP/diag/snapshot dataset=full

Set 'dataset' to 'full'

->

Then start the ILOM snapshot using the IP of the target system we will put 
the ILOM on and its root password
(it'll copy the ILOM snapshot in /tmp in the below example) :

-> set /SP/diag/snapshot dump_uri=sftp://root:root_password@10.11.12.13/tmp
Collecting a "full" dataset may reset the host. Are you sure (y/n)? y
Set 'dump_uri' to 'sftp://root@10.11.12.13/tmp'

Now that the ILOM snapshot has been started, 
you can monitor it using the below command :

-> show /SP/diag/snapshot

/SP/diag/snapshot
Targets:

Properties:
dataset = full
dump_uri = (Cannot show property)
encrypt_output = false
result = Running

Commands:
cd
set
show

->

After few minutes you should see the ILOM snapshot as completed :

-> show /SP/diag/snapshot

/SP/diag/snapshot
Targets:
Properties:
dataset = full
dump_uri = (Cannot show property)
encrypt_output = false
result = Collecting data into sftp://root@10.11.12.13/tmp/myclustercel07-ilom_1133FMM02D_2018-02-04T23-18-06.zip
Snapshot Complete.
Done.

Commands:
cd
set
show

->

This is actually quite a small file easy to transfer to MOS :

[root@myclusterdb01 ~]# du -sh /tmp/myclustercel07-ilom_1133FMM02D_2018-02-04T23-18-06.zip
2.5M /tmp/myclustercel07-ilom_1133FMM02D_2018-02-04T23-18-06.zip
[root@myclusterdb01 ~]#

Thank you Oracle ACE Fred Denis for sharing his experience

Thank you for visiting this blog 🙂

Manually reboot a database server using its ILOM

The base on the requirement this will be a DBMA Task

We will be using its ILOM which is the administration console each Exadata component has. Be sure to have :
  • The database server ILOM IP (usually <dbserver-name>-ilom like <mycluster>db02-ilom)
  • The ILOM root’s password (in case of, the default password is welcome1)
[root@myclusterdb01 ~]# ssh myclusterdb01-ilom
Warning: Permanently added the RSA host key for IP address '10.191.84.24' to the                                                                            list of known hosts.
Password
Oracle(R) Integrated Lights Out Manager
Version 3.2.8.25 r114493
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.
Warning: HTTPS certificate is set to factory default.
Hostname: myclusterdb04-ilom

-> reset /SYS 
Are you sure you want to reset /SYS (y/n)? y
Performing hard reset on /SYS
->

This would have started a hard reboot of the myclusterdb01 database server.

You can then connect to the console to have a look at what is happening (the server boot logs) :

-> start /sp/console
Are you sure you want to start /SP/console (y/n)? y
Serial console started.  To stop, type ESC (
. . .
[INFO] /usr/sbin/ipmitool user set name 4 iu_ngtmh
[INFO] /usr/sbin/ipmitool user set password 4 ********
[INFO] Executing: /usr/bin/mstflint -y -d /proc/bus/pci/40/00.0 -i /var/log/exadatatmp/firmware/ActualFirmwareFiles/fw-ConnectX3-rel-2_35_5532-15-7046442_7092757.bin  burn

    Current FW version on flash:  2.11.1280
    New FW version:               2.35.5532

[INFO] run /usr/sbin/ipmitool cmd to set /SP/users/iu_ngtmh/role=aucro
Burning FS2 FW image without signatures - 7[INFO] export IPMI_PASSWORD=********
[INFO] /usr/sbin/ipmiflash -v -I lanplus -H 10.191.84.24 -U iu_ngtmh -E write /var/log/exadatatmp/firmware/ActualFirmwareFiles/ILOM-3_2_10_22_a_r121452-Sun_Server_X4-2.pkg force script config delaybios warning=0
Burning FS2 FW image without signatures - OK
Restoring signature                     - OK
[INFO] Waiting for the service processor to finish firmware upgrade for up to 1200 seconds.
. . .

Give few minutes to the server to reboot and you’re done.

Please keep in mind that :
Unlike an Infiniband Switch, you do not have to use the spsh command to jump into the ILOM shell as you are using the dedicated ILOM IP address to connect to
Note that you have to use this weird ILOM syntax to quit the console : ESC and then “(”

Thank you Oracle ACE Fred Denis for sharing his experience

Thank you for visiting this blog 🙂

Manually reboot an Infiniband Switch

The base on the requirement this will be a DBMA Task

  • Do NOT reboot both Exadata Switches at the same time — you’ll get into lots of trouble
  • An IB Switch ILOM is embedded within the Switch itself and has to be accessed using the ILOM shell with the “spsh” command and then use the “reset /SP” command to reboot the Switch as shown below
# ssh myclustersw-ib3
# spsh
-> reset /SP
Are you sure you want to reset /SP (y/n)? y
[root@myclusterdb01 ~]# ssh myclustersw-ib3

Last login: Wed Dec 20 17:58:46 2017 from myclusterdb01.mydomain.com
You are now logged in to the root shell.
It is recommended to use ILOM shell instead of root shell.
All usage should be restricted to documented commands and documented
config files.
To view the list of documented commands, use "help" at linux prompt.

[root@myclustersw-ib3 ~]# spsh

Oracle(R) Integrated Lights Out Manager
Version ILOM 3.0 r47111
Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.

->  reset /SYS
Are you sure you want to reset /SP (y/n)? y

Performing reset on /SP Broadcast message from root (Sun Jan 28 20:14:42 2018):
The system is going down for reboot NOW!
-> Connection to myclustersw-ib3 closed by remote host.
Connection to myclustersw-ib3 closed.

[root@myclusterdb01 ~]#

https://docs.oracle.com/cd/E19273-01/html/821-0243/gixyc.html

Thank you Oracle ACE Fred Denis for sharing his experience

Thank you for visiting this blog 🙂

Shut down or reboot an Exadata storage cell without affecting ASM

This article covers some of the DBMA Commands which will be useful while performing this scenario where we want to reboot the cell not due to some maintenance activity.

[1] Verify the existing disk_repair_time attribute for all diskgroups
SQL> select dg.name,a.value from 
v$asm_diskgroup dg, v$asm_attribute a 
where dg.group_number=a.group_number and
a.name='disk_repair_time';

[2] The default disk_repair_time is 3.6 hours only so better to adjust.
 SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';
[3] Next you will need to check if ASM will be OK if the grid disks go OFFLINE. 
The following command should return 'Yes' for the grid disks being listed:

cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

Execute the command below and the output should show either 
asmmodestatus=OFFLINE or asmmodestatus=UNUSED and 
asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM. 
Only then is it safe to proceed with shutting down or restarting the cell:

Note: Shutting down the cell services when one or more grid disks does not return 
asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, 
causing the databases to shut down abruptly.

[4] Inactivate all grid disks on the cell you wish to power down/reboot:
cellcli -e alter griddisk all inactive

[5] Confirm that the griddisks are now offline by performing the following actions:
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
cellcli -e list griddisk

Note:
Execute the command below and the output should show either asmmodestatus=OFFLINE or 
asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are 
offline in ASM. Only then is it safe to proceed with shutting down or restarting the cell

[6] You can now reboot the cell.
#shutdown -h now

[7] Once the cell comes back online - you will need to reactive the griddisks:

cellcli -e alter griddisk all active

[8] Issue the command below and all disks should show 'active':

cellcli -e list griddisk

[9] Verify grid disk status: 

cellcli -e list griddisk attributes name, asmmodestatus
cellcli -e list griddisk attributes name where asmdeactivationoutcome != 'Yes'

Below are the some of the good article related to this topics.

For detail information,please refer MOS DOC ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM.

Thank you for visiting this blog 🙂

Shutdown and Startup Exadata

This article covers the scenario for graceful shutdown and startup Exadata and Oracle Databases.

#################
Shutdown Sequence
#################

[0] Firstly check all Database and Cluster resources, which resources are online which was offline.

[1] Shutdown database(s) safely as follows.

$ srvctl stop database -d DB_NAME

[2] If all databases and their instances shutdown, then stop Cluster Resources as follows. Firstly Set Grid Infrastructure Profile, then stop all resources.

$crsctl stop cluster -all

Or

You can stop CRS on each database server node as follows.

$ crsctl stop crs

[3] Shutdown all Cell Servers.

You can shutdown all Cell Servers on Database server as follows.

$dcli -g cell_group -l root "su - celladmin -c \"cellcli -e alter cell shutdown services all \""

Or

you can shutdown each cell server one by one as follows. Before shutdown it, check its services as follows.

#service celld status
#service celld stop

Do this step for all Cell servers one by one.

[4] Shutdown Cell Servers.

You can shutdown them with one command from database server as follows.

$dcli -g cell_group -l root poweroff

or

you can shutdown them one by one on related cell server.
# shutdown -h -y now

[5] Shutdown all Database Servers as follows.

shutdown -h -y now
or
poweroff

If all ssh connections are down, then check the Exadata lights if they are on or off.

##################
Startup Sequence
##################

[6] Once exadata has been offline, you can startup it by clicking the power button on the front panel of the Exadata Storage Servers and Database servers.

When the Storage servers and database servers’ light are on, then it will startup in a 10 minutes.

You can startup cell servers from Database server using the following command.

for host in `cat cell_group`; do
echo ${host}: `ipmitool -H ${host}-ilom -U root -P welcome1 chassis power on`
done

You can also startup exadata by using ILOM.

[7] Check all Cell Servers if they are online or not.

$dcli -g cell_group -l root 'hostname; uptime'
$dcli -g cell_group -l root "su - celladmin -c \"cellcli -e list cell detail \""

[8] If Cell servers are online, then check the Database Servers.

Normall all CRS services should be startup automatically.
But check it as follows on each database server.

crsctl status res -t
ps -ef | grep smon

[9] If Cluster and database services are not startup automatically, then startup them as follows.

crsctl start cluster -all

[10] If Databases are not up, then Startup database(s) as follows.

$ srvctl start database -d DB_NAME

Now All Exadata and database services are up, you can use them safely.

Refer to MOS Doc ID 1093890.1 for detailed steps.
Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration (Doc ID 1093890.1)

Thank you for visiting this blog 🙂

Upgrading and Patching Exadata to 18c and 19c

Special thanks to Oracle ACE Fred Denis for sharing his experience https://unknowndba.blogspot.com/

I would like to share my experience in this article which covers the modified and latest scenarios and hands-on for upgrade and patching the On-premises Exadata to 18c and 19c.

The Exadata Quarterly Full Stack Download Patch (QFSDP) is the recommended way to upgrade all Exadata components. It will be released on a quarterly.

QFSDP releases contain the latest software for the following components:

Infrastructure
  • Exadata Storage Server
  • InfiniBand Switch
  • Power Distribution Unit
Database
  • Oracle Database and Grid Infrastructure PSU
  • Oracle JavaVM PSU (as of Oct 2014)
  • OPatch
  • OPlan
Systems Management
  • EM Agent
  • EM OMS
  • EM Plug-ins
First, refer to the Oracle MOS Document ID 888828.1 and download the QFSDP Patch.
The Preview of this patching with the order and the tools we will be using

0. An advice
1. General Information
2. Some prerequisites it is worth doing before the maintenance
2.1 Download and unzip the Bundle
2.2 Download the latest patchmgr
2.3 SSH keys
2.4 Upgrade opatch
2.5 Run the prechecks
2.5.1 Cell patching prechecks
2.5.2 Check disk_repair_time
2.5.3 DB Nodes prechecks
2.5.4 Dependencies issues
2.5.5 IB Switches prechecks
2.5.6 ROCE Switches prechecks
2.5.7 Grid Infrastructure prechecks

3 The patching procedure
3.1 Patching the cells (aka Storage servers)
3.2 Patching the IB switches
3.2.roce/ Patching the ROCE switches
3.3 Patching the Database servers (aka Compute Nodes)
3.4 Patching the Grid Infrastructure
3.5 Upgrading the Grid Infrastructure:
3.5.1 Upgrade Grid Infrastructure to 12.2
3.5.2 Upgrade Grid Infrastructure to 18c
3.6 Upgrading the Cisco Switch / enabling SSH access to the Cisco Switch
4 The Rollback procedure
4.1 Cell Rollback
4.2 DB nodes Rollback
4.3 IB Switches Rollback
5 Troubleshooting
How to take an ILOM snapshot with the command line
How to reboot a database server using its ILOM (same procedure applies for a storage server)
How to manually reboot an Infiniband Switch
Restart SSH on a storage cell with no SSH access
How to re-image an Exadata database server
How to re-image an Exadata cell storage server
. . . more to come . . .
6/ Timing

0. An advice


Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)

Do NOT continue to the next step before a failed step is properly resolved.

It is supported to run different Exadata versions between servers. 
For example, some storage servers may run 11.2.2.4.2 while others run 
11.2.3.1.1, or all storage servers may run 11.2.3.1.1 while database servers 
run 11.2.2.4.2. However, it is highly recommended that this be only a temporary 
configuration that exists for the purpose and duration of rolling upgrade.

1. General Information

Please find some information you need to know before starting to patch your Exadata :

  • There is no difference in the procedure whether you patch a 12.1 Exadata, a 12.2 Exadata or you upgrade an Exadata to 18c (18c is a patchset of 12.2); the examples of this blog are from recent maintenance to upgrade an Exadata to 18c
  • It is better to have a basic understanding of what is an Exadata before jumping in this patch procedure
  • This procedure does not apply to an ODA (Oracle Database Appliance)
  • I will use the /Oct2018_Bundle FS to save the Bundle in the examples of this blog
  • I use the “DB node” term here, it means “database node“, aka “Compute node“; the nodes where the Grid Infrastructure and the database are running, I will also use the db01 term for the database node number 1, usually named “cluster_name” db01
  • I use the “cell” word aka “storage servers“, the servers that manage your storage. I will also use cel01 for the storage server number 1, usually named “cluster_name”cel01
  • It is good to have the screen utility installed; if not, use nohup
  • Almost all the procedure will be executed as root
  • I will be patching the IB Switches from the DB node 1 server
  • I will be patching the cells from the DB node 1 server
  • I will be patching the DB nodes from the cel01 server
  • I will not cover the databases Homes as there is nothing specific to Exadata here
  • I will be using the rac-status.sh script to easily check the status of the resources of the Exadata as well as easily follow the patch progress
  • I will be using the exa-versions.sh script to easily check the versions of the Exadata components
  • I will be using the cell-status.sh script to easily check the status of the cell and grid disks of the storage servers

2. Some prerequisites it is worth doing before the maintenance

I highly recommend executing these prerequisites as early as possible. The sooner you discover an issue in these prerequisites, the better.

1. Verify the $TMOUT

echo $TMOUT
14400

This value is approximately 4 hours.

2. Confirm the connectivity to ilom
#ssh root@myclusterdb01-ilom
#ssh root@myclusterdb02-ilom
#ssh root@myclusterdb03-ilom
#ssh root@myclusterdb04-ilom

#ssh root@myclustercel01-ilom
........
#ssh root@myclustercel07-ilom

Note: Provide the password and confirm the connectivity with ilom.

3. Verify cells

#dcli -l root -g cell_group cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null'

#cat cell_group
myclustercel01-ilom
.....
myclustercel07-ilom

#dcli -l root -g cell_group cellcli -e list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome

Note: Output should be
active ONLINE Yes

4. Verify configuration

#dcli -l root -g cell_group /opt/oracle.cellos/ipconf -verify

5. ofa rpm

dcli -l root -g cell_group "rpm -qa | grep ofa"

6. Check ASM Instance and ensure that diskgroup are mounted.

sqlplus / as sysdba
select inst_id,name,state from gv$asm_diskgroup group by inst_id,name,state;

2.1 Download and unzip the Bundle

Review the Exadata general note (Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)) to find the latest Bundle, download it and unzip it; be sure that every directory is owned by oracle:dba to avoid any issue in the future :

The latest available Patch 31754150 – Quarterly Full Stack Download For Oracle Exadata (QFSDP) July 2020

for I in `ls p31754150_190000_Linux-x86-64*f10.zip`
cat *.tar.* | tar -xvf –

It will make a single directory with patch number called 31754150

Inside the 31754150 directory you will get the following directories

Note: you can use unzip -q to make unzip silent

It contained patches for following stacks
Database/Clusterware
Database Server
Storage Server
Infiniband
PDUs

2.2 Download the latest patchmgr

patchmgr is the orchestation tool who will perform most of the job here. Its version can change quite often so I recommend double checking that the version shipped with the Bundle is the latest one:

-- patchmgr is in the below directory :
/FileSystem/Bundle_patch_number/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/Version
Example for the July 2020 PSU :
/July2020_Bundle/31754150/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_181800_Linux-x86-64.zip
-- Patchmgr is delivered on Metalink through this patch:
Patch 21634633: DBSERVER.PATCH.ZIP ORCHESTRATOR PLUS DBNU - ARU PLACEHOLDER

— Download the latest patchmgr version and replace it in the Bundle directory
Example

2.3 SSH keys

For this step, if you are not confident with the dbs_groupcell_group, etc… files, here is how to create them as I have described it in this post (look for “dbs_group” in the post).
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group
[root@myclusterdb01 ~]# cat /root/dbs_group ~/cell_group > /root/all_group
[root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group

We would need a few SSH keys deployed in order to ease the patches application :
root ssh keys deployed from the db01 server to the IB Switches (you will have to enter the root password once for each IB Switch)

[root@myclusterdb01 ~]# cat ~/ib_group
myclustersw-ib2
myclustersw-ib3
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustersw-ib3's password:
root@myclustersw-ib2's password:
myclustersw-ib2: ssh key added
myclustersw-ib3: ssh key added
[root@myclusterdb01 ~]#

root ssh keys deployed from the cel01 server to all the database nodes (you will have to enter the root password once for each database server)

[root@myclustercel01 ~]# cat ~/dbs_group
myclusterdb01
myclusterdb02
myclusterdb03
myclusterdb04
[root@myclustercel01 ~]# dcli -g ~/dbs_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclusterdb01's password:
root@myclusterdb03's password:
root@myclusterdb04's password:
root@myclusterdb02's password:
myclusterdb01: ssh key added
myclusterdb02: ssh key added
myclusterdb03: ssh key added
myclusterdb04: ssh key added
[root@myclustercel01 ~]#

root ssh keys deployed from the db01 server to all the cells (you will have to enter the root password once for each cell)

[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root hostname
myclustercel01: myclustercel01.mydomain.com
myclustercel02: myclustercel02.mydomain.com
myclustercel03: myclustercel03.mydomain.com
myclustercel04: myclustercel04.mydomain.com
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustercel04's password:
...
root@myclustercel03's password:
myclustercel01: ssh key added
...
myclustercel06: ssh key added
[root@myclusterdb01 ~]#

2.4 Upgrade opatch

It is highly recommended to upgrade opatch before any patching activity and this Bundle is not an exception. Please find a detailled procedure to quickly upgrade opatch with dcli in this post.
Please note that upgrading opatch will also allow you to be ocm.rsp-free !

[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid /u01/app/12.1.0.2/grid/OPatch/opatch version | grep Version
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid -f /Oct2018_Bundle/28183368/Database/OPatch/12.2/12.2.0.1.*/p6880880_12*_Linux-x86-64.zip -d /tmp
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid "unzip -o /tmp/p6880880_12*_Linux-x86-64.zip -d /u01/app/12.1.0.2/grid; /u01/app/12.1.0.2/grid/OPatch/opatch version; rm /tmp/p6880880_12*_Linux-x86-64.zip" | grep Version

2.5 Run the prechecks

It is very important to run those prechecks and take good care of the outputs. They have to be 100% successful to ensure a smooth application of the patches.

2.5.1 Cell patching prechecks

First of all, you’ll have to unzip the patch:

[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch
[root@myclusterdb01 ~]# unzip -q p28633752_*_Linux-x86-64.zip

— This should create a patch_18.1.9.0.0.181006 directory with the cell patch
And start the prerequisites from database node 1:

[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/patch_18.1.9.0.0.181006
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling

2.5.2 Check disk_repair_time

You have to be aware and understand this parameter. Indeed, disk_repair_time specifies the amount of time before ASM drops a disk after it is taken offline — the default for this parameter is 3.6h.
Oracle recommends to set this parameter to 8h when patching a cell. But as we will see in the cell patching logs, patchmgr‘s timeout for this operation is 600 minutes (then 10 hours) and as I had issues in the past with a very long cell patching, I now use to set this parameter to 24h as Oracle has recommended when I faced very long cell patching. I would then recommend anyone to set it to 24h when patching — this is what I will describe in the cell patchingh procedure. We will just have a look at the value of the parameter for awareness here.
Please note that this prerequisite is only needed for a rolling patch application.

SQL> select dg.name as diskgroup, a.name as attribute, a.value 
from v$asm_diskgroup dg, v$asm_attribute a 
where dg.group_number=a.group_number 
and (a.name like '%repair_time' or a.name = 'compatible.asm');

DISKGROUP          ATTRIBUTE              VALUE
---------------- ----------------------- ----------------------------------------
DATA             disk_repair_time         3.6h
DATA             compatible.asm           11.2.0.2.0
DBFS_DG          disk_repair_time         3.6h
DBFS_DG          compatible.asm           11.2.0.2.0
RECO             disk_repair_time         3.6h
RECO             compatible.asm           11.2.0.2.0

6 rows selected.

2.5.3 DB Nodes prechecks

As we cannot patch a node we are connected to, we will start the patch from a cell server (myclustercel01). To be able to do that, we first need to copy patchmgr and the ISO file on this cell server. Do NOT unzip the ISO file, patchmgr will take care of it.

I create a /tmp/SAVE directory to patch the database servers. Having a SAVE directory in /tmp is a good idea to avoid the automatic maintenance jobs that purge /tmp every day (directories > 5 MB and older than 1 day). If not, these maintenance jobs will delete the dbnodeupdate.zip file that is mandatory to apply the patch — this won’t survive a reboot though.

[root@myclusterdb01 ~]# ssh root@myclustercel01 rm -fr /tmp/SAVE
[root@myclusterdb01 ~]# ssh root@myclustercel01 mkdir /tmp/SAVE
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataDatabaseServer_OL6/p28666206_*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp ~/dbs_group root@myclustercel01:~/.
[root@myclusterdb01 ~]# ssh root@myclustercel01
[root@myclustercel01 ~]# cd /tmp/SAVE
[root@myclustercel01 ~]# unzip -q p21634633_*_Linux-x86-64.zip

This should create a dbserver_patch_5.180720 directory (the name may be slightly different if you use a different patchmgr than the one shipped with the Bundle)
And start the prerequisites:

[root@myclustercel01 ~]# cd /tmp/SAVE/dbserver_patch_*
[root@myclustercel01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck  
-iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 18.1.9.0.0.181006 
-allow_active_network_mounts

— You can safely ignore the below warning (this is a patchmgr bug for a while) if the GI version is > 11.2.0.2 — which is most likely the case
(*) – Yum rolling update requires fix for 11768055 when Grid Infrastructure is below 11.2.0.2 BP12
Note : if your source version is > 12.1.2.1.1, you can use the -allow_active_network_mounts parameter to be able to patch all the DB nodes without taking care of the NFS. In the opposite, if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS manually before patching the DB nodes

2.5.4 Dependencies issues

You may have dependencies issues reported by the database servers pre-requisites. I have documented the 2 cases you can be in and the 2 ways you can fix this:

  • When there is no OS upgrade, follow this blog.
  • When there is an OS upgrade (from 12c or 18c to 19c or above), please have a look at this blog

2.5.5 IB Switches prechecks

– To avoid issues with NFS/ZFS when rebooting the IB Switches (I got a lot in the past, not sure if it came from the client configuration but it is always unpleasant), I recommend copying the patch outside of any NFS/ZFS
– This patch is ~ 2.5 GB so be careful not to fill / if you copy it into /tmp, if not, choose another local FS

[root@myclusterdb01 ~]# du -sh /tmp/IB_PATCHING
[root@myclusterdb01 ~]# rm -fr /tmp/IB_PATCHING
[root@myclusterdb01 ~]# mkdir /tmp/IB_PATCHING
[root@myclusterdb01 ~]# unzip -q /Oct2018_Bundle/28689205/Infrastructure/18.1.9.0.0/ExadataStorageServer_InfiniBandSwitch/p28633752_*_Linux-x86-64.zip -d /tmp/IB_PATCHING
[root@myclusterdb01 ~]# cd /tmp/IB_PATCHING/patch_18.1.9.0.0.181006
[root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -ibswitch_precheck -upgrade

Note: despite what patchmgr documentation says, you have to specify an ib_group configuration file containing the list of your IB Switches
If the pre requisites show some conflicts to be resolved, please have a look at this blog where I explain how to manage the OS dependencies issues but do NOT use the -modify_at_prereq option straight away.

2.5.6 ROCE Switches prechecks

– As explained here, we will use an OS user admin to be able to run the pre-requisites (this is fixed from July 2020 and can be run as root from July 2020):

 [admin@x8m_01]$ pwd
/patches/APR2020/30783929/Infrastructure/19.3.7.0.0/ExadataStorageServer_InfiniBandSwitch/patch_19.3.7.0.0.200428
[admin@x8m_01]$ ./patchmgr --roceswitches ~/roce_group --upgrade --roceswitch-precheck --log_dir /tmp

2.5.7 Grid Infrastructure prechecks

To start with, be sure that the patch has been unzipped (as the GI owner user to avoid any further permission issue):

[grid@myclusterdb01 ~]$ cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU
[grid@myclusterdb01 ~]$ unzip -q p28714316*_Linux-x86-64.zip

— This should create a 27968010 directory.
And start the prerequisites on each node:

[root@myclusterdb01 ~]# . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'`
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316
[root@myclusterdb01 ~]# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze

Alternatively, you can start the GI prerequisites on all nodes in parallel in one command

[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root "cd /Oct2018_Bundle/28689205/Database/12.2.0.1.0/12.2.0.1.181016GIRU/28714316; /u01/app/12.1.0.2/grid/OPatch/opatchauto apply -oh /u01/app/12.1.0.2/grid -analyze"

Note: You will most likely see some warnings here, check the logfiles and they will probably be due to some patches that will be rolled back as they will not be useful any more.

Now that everything is downloaded, unzipped, updated, and that every prerequisite is successful, we can safely jump to the patching procedure in part 2!

Thank you for visiting this blog 🙂