Upgrading and Patching Exadata to 18c and 19c

Special thanks to Oracle ACE Fred Denis for sharing his experience https://unknowndba.blogspot.com/

I would like to share my experience in this article which covers the modified and latest scenarios and hands-on for upgrade and patching the On-premises Exadata to 18c and 19c.

The Exadata Quarterly Full Stack Download Patch (QFSDP) is the recommended way to upgrade all Exadata components. It will be released on a quarterly.

QFSDP releases contain the latest software for the following components:

  • Exadata Storage Server
  • InfiniBand Switch
  • Power Distribution Unit
  • Oracle Database and Grid Infrastructure PSU
  • Oracle JavaVM PSU (as of Oct 2014)
  • OPatch
  • OPlan
Systems Management
  • EM Agent
  • EM OMS
  • EM Plug-ins
First, refer to the Oracle MOS Document ID 888828.1 and download the QFSDP Patch.
The Preview of this patching with the order and the tools we will be using

0. An advice
1. General Information
2. Some prerequisites it is worth doing before the maintenance
2.1 Download and unzip the Bundle
2.2 Download the latest patchmgr
2.3 SSH keys
2.4 Upgrade opatch
2.5 Run the prechecks
2.5.1 Cell patching prechecks
2.5.2 Check disk_repair_time
2.5.3 DB Nodes prechecks
2.5.4 Dependencies issues
2.5.5 IB Switches prechecks
2.5.6 ROCE Switches prechecks
2.5.7 Grid Infrastructure prechecks

3 The patching procedure
3.1 Patching the cells (aka Storage servers)
3.2 Patching the IB switches
3.2.roce/ Patching the ROCE switches
3.3 Patching the Database servers (aka Compute Nodes)
3.4 Patching the Grid Infrastructure
3.5 Upgrading the Grid Infrastructure:
3.5.1 Upgrade Grid Infrastructure to 12.2
3.5.2 Upgrade Grid Infrastructure to 18c
3.6 Upgrading the Cisco Switch / enabling SSH access to the Cisco Switch
4 The Rollback procedure
4.1 Cell Rollback
4.2 DB nodes Rollback
4.3 IB Switches Rollback
5 Troubleshooting
How to take an ILOM snapshot with the command line
How to reboot a database server using its ILOM (same procedure applies for a storage server)
How to manually reboot an Infiniband Switch
Restart SSH on a storage cell with no SSH access
How to re-image an Exadata database server
How to re-image an Exadata cell storage server
. . . more to come . . .
6/ Timing

0. An advice

Exadata Patching Overview and Patch Testing Guidelines (Doc ID 1262380.1)

Do NOT continue to the next step before a failed step is properly resolved.

It is supported to run different Exadata versions between servers. 
For example, some storage servers may run while others run, or all storage servers may run while database servers 
run However, it is highly recommended that this be only a temporary 
configuration that exists for the purpose and duration of rolling upgrade.

1. General Information

Please find some information you need to know before starting to patch your Exadata :

  • There is no difference in the procedure whether you patch a 12.1 Exadata, a 12.2 Exadata or you upgrade an Exadata to 18c (18c is a patchset of 12.2); the examples of this blog are from recent maintenance to upgrade an Exadata to 18c
  • It is better to have a basic understanding of what is an Exadata before jumping in this patch procedure
  • This procedure does not apply to an ODA (Oracle Database Appliance)
  • I will use the /Oct2018_Bundle FS to save the Bundle in the examples of this blog
  • I use the “DB node” term here, it means “database node“, aka “Compute node“; the nodes where the Grid Infrastructure and the database are running, I will also use the db01 term for the database node number 1, usually named “cluster_name” db01
  • I use the “cell” word aka “storage servers“, the servers that manage your storage. I will also use cel01 for the storage server number 1, usually named “cluster_name”cel01
  • It is good to have the screen utility installed; if not, use nohup
  • Almost all the procedure will be executed as root
  • I will be patching the IB Switches from the DB node 1 server
  • I will be patching the cells from the DB node 1 server
  • I will be patching the DB nodes from the cel01 server
  • I will not cover the databases Homes as there is nothing specific to Exadata here
  • I will be using the rac-status.sh script to easily check the status of the resources of the Exadata as well as easily follow the patch progress
  • I will be using the exa-versions.sh script to easily check the versions of the Exadata components
  • I will be using the cell-status.sh script to easily check the status of the cell and grid disks of the storage servers

2. Some prerequisites it is worth doing before the maintenance

I highly recommend executing these prerequisites as early as possible. The sooner you discover an issue in these prerequisites, the better.

1. Verify the $TMOUT

echo $TMOUT

This value is approximately 4 hours.

2. Confirm the connectivity to ilom
#ssh root@myclusterdb01-ilom
#ssh root@myclusterdb02-ilom
#ssh root@myclusterdb03-ilom
#ssh root@myclusterdb04-ilom

#ssh root@myclustercel01-ilom
#ssh root@myclustercel07-ilom

Note: Provide the password and confirm the connectivity with ilom.

3. Verify cells

#dcli -l root -g cell_group cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null'

#cat cell_group

#dcli -l root -g cell_group cellcli -e list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome

Note: Output should be
active ONLINE Yes

4. Verify configuration

#dcli -l root -g cell_group /opt/oracle.cellos/ipconf -verify

5. ofa rpm

dcli -l root -g cell_group "rpm -qa | grep ofa"

6. Check ASM Instance and ensure that diskgroup are mounted.

sqlplus / as sysdba
select inst_id,name,state from gv$asm_diskgroup group by inst_id,name,state;

2.1 Download and unzip the Bundle

Review the Exadata general note (Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)) to find the latest Bundle, download it and unzip it; be sure that every directory is owned by oracle:dba to avoid any issue in the future :

The latest available Patch 31754150 – Quarterly Full Stack Download For Oracle Exadata (QFSDP) July 2020

for I in `ls p31754150_190000_Linux-x86-64*f10.zip`
cat *.tar.* | tar -xvf –

It will make a single directory with patch number called 31754150

Inside the 31754150 directory you will get the following directories

Note: you can use unzip -q to make unzip silent

It contained patches for following stacks
Database Server
Storage Server

2.2 Download the latest patchmgr

patchmgr is the orchestation tool who will perform most of the job here. Its version can change quite often so I recommend double checking that the version shipped with the Bundle is the latest one:

-- patchmgr is in the below directory :
Example for the July 2020 PSU :
-- Patchmgr is delivered on Metalink through this patch:

— Download the latest patchmgr version and replace it in the Bundle directory

2.3 SSH keys

For this step, if you are not confident with the dbs_groupcell_group, etc… files, here is how to create them as I have described it in this post (look for “dbs_group” in the post).
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep db | awk '{print $6}' | sort > /root/dbs_group
[root@myclusterdb01 ~]# ibhosts | sed s'/"//' | grep cel | awk '{print $6}' | sort > /root/cell_group
[root@myclusterdb01 ~]# cat /root/dbs_group ~/cell_group > /root/all_group
[root@myclusterdb01 ~]# ibswitches | awk '{print $10}' | sort > /root/ib_group

We would need a few SSH keys deployed in order to ease the patches application :
root ssh keys deployed from the db01 server to the IB Switches (you will have to enter the root password once for each IB Switch)

[root@myclusterdb01 ~]# cat ~/ib_group
[root@myclusterdb01 ~]# dcli -g ~/ib_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustersw-ib3's password:
root@myclustersw-ib2's password:
myclustersw-ib2: ssh key added
myclustersw-ib3: ssh key added
[root@myclusterdb01 ~]#

root ssh keys deployed from the cel01 server to all the database nodes (you will have to enter the root password once for each database server)

[root@myclustercel01 ~]# cat ~/dbs_group
[root@myclustercel01 ~]# dcli -g ~/dbs_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclusterdb01's password:
root@myclusterdb03's password:
root@myclusterdb04's password:
root@myclusterdb02's password:
myclusterdb01: ssh key added
myclusterdb02: ssh key added
myclusterdb03: ssh key added
myclusterdb04: ssh key added
[root@myclustercel01 ~]#

root ssh keys deployed from the db01 server to all the cells (you will have to enter the root password once for each cell)

[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root hostname
myclustercel01: myclustercel01.mydomain.com
myclustercel02: myclustercel02.mydomain.com
myclustercel03: myclustercel03.mydomain.com
myclustercel04: myclustercel04.mydomain.com
[root@myclusterdb01 ~]# dcli -g ~/cell_group -l root -k -s '-o StrictHostKeyChecking=no'
root@myclustercel04's password:
root@myclustercel03's password:
myclustercel01: ssh key added
myclustercel06: ssh key added
[root@myclusterdb01 ~]#

2.4 Upgrade opatch

It is highly recommended to upgrade opatch before any patching activity and this Bundle is not an exception. Please find a detailled procedure to quickly upgrade opatch with dcli in this post.
Please note that upgrading opatch will also allow you to be ocm.rsp-free !

[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid /u01/app/ version | grep Version
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid -f /Oct2018_Bundle/28183368/Database/OPatch/12.2/*/p6880880_12*_Linux-x86-64.zip -d /tmp
[grid@myclusterdb01 ~]$ dcli -g ~/dbs_group -l grid "unzip -o /tmp/p6880880_12*_Linux-x86-64.zip -d /u01/app/; /u01/app/ version; rm /tmp/p6880880_12*_Linux-x86-64.zip" | grep Version

2.5 Run the prechecks

It is very important to run those prechecks and take good care of the outputs. They have to be 100% successful to ensure a smooth application of the patches.

2.5.1 Cell patching prechecks

First of all, you’ll have to unzip the patch:

[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/
[root@myclusterdb01 ~]# unzip -q p28633752_*_Linux-x86-64.zip

— This should create a patch_18. directory with the cell patch
And start the prerequisites from database node 1:

[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Infrastructure/
[root@myclusterdb01 ~]# ./patchmgr -cells ~/cell_group -patch_check_prereq -rolling

2.5.2 Check disk_repair_time

You have to be aware and understand this parameter. Indeed, disk_repair_time specifies the amount of time before ASM drops a disk after it is taken offline — the default for this parameter is 3.6h.
Oracle recommends to set this parameter to 8h when patching a cell. But as we will see in the cell patching logs, patchmgr‘s timeout for this operation is 600 minutes (then 10 hours) and as I had issues in the past with a very long cell patching, I now use to set this parameter to 24h as Oracle has recommended when I faced very long cell patching. I would then recommend anyone to set it to 24h when patching — this is what I will describe in the cell patchingh procedure. We will just have a look at the value of the parameter for awareness here.
Please note that this prerequisite is only needed for a rolling patch application.

SQL> select dg.name as diskgroup, a.name as attribute, a.value 
from v$asm_diskgroup dg, v$asm_attribute a 
where dg.group_number=a.group_number 
and (a.name like '%repair_time' or a.name = 'compatible.asm');

DISKGROUP          ATTRIBUTE              VALUE
---------------- ----------------------- ----------------------------------------
DATA             disk_repair_time         3.6h
DATA             compatible.asm 
DBFS_DG          disk_repair_time         3.6h
DBFS_DG          compatible.asm 
RECO             disk_repair_time         3.6h
RECO             compatible.asm 

6 rows selected.

2.5.3 DB Nodes prechecks

As we cannot patch a node we are connected to, we will start the patch from a cell server (myclustercel01). To be able to do that, we first need to copy patchmgr and the ISO file on this cell server. Do NOT unzip the ISO file, patchmgr will take care of it.

I create a /tmp/SAVE directory to patch the database servers. Having a SAVE directory in /tmp is a good idea to avoid the automatic maintenance jobs that purge /tmp every day (directories > 5 MB and older than 1 day). If not, these maintenance jobs will delete the dbnodeupdate.zip file that is mandatory to apply the patch — this won’t survive a reboot though.

[root@myclusterdb01 ~]# ssh root@myclustercel01 rm -fr /tmp/SAVE
[root@myclusterdb01 ~]# ssh root@myclustercel01 mkdir /tmp/SAVE
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/SoftwareMaintenanceTools/DBServerPatch/19.181002/p21634633_*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp /Oct2018_Bundle/28689205/Infrastructure/*_Linux-x86-64.zip root@myclustercel01:/tmp/SAVE/.
[root@myclusterdb01 ~]# scp ~/dbs_group root@myclustercel01:~/.
[root@myclusterdb01 ~]# ssh root@myclustercel01
[root@myclustercel01 ~]# cd /tmp/SAVE
[root@myclustercel01 ~]# unzip -q p21634633_*_Linux-x86-64.zip

This should create a dbserver_patch_5.180720 directory (the name may be slightly different if you use a different patchmgr than the one shipped with the Bundle)
And start the prerequisites:

[root@myclustercel01 ~]# cd /tmp/SAVE/dbserver_patch_*
[root@myclustercel01 ~]# ./patchmgr -dbnodes ~/dbs_group -precheck  
-iso_repo /tmp/SAVE/p28666206_*_Linux-x86-64.zip -target_version 

— You can safely ignore the below warning (this is a patchmgr bug for a while) if the GI version is > — which is most likely the case
(*) – Yum rolling update requires fix for 11768055 when Grid Infrastructure is below BP12
Note : if your source version is >, you can use the -allow_active_network_mounts parameter to be able to patch all the DB nodes without taking care of the NFS. In the opposite, if you have some NFS mounted, you will have some error messages, you can ignore them at this stage, we will umount the NFS manually before patching the DB nodes

2.5.4 Dependencies issues

You may have dependencies issues reported by the database servers pre-requisites. I have documented the 2 cases you can be in and the 2 ways you can fix this:

  • When there is no OS upgrade, follow this blog.
  • When there is an OS upgrade (from 12c or 18c to 19c or above), please have a look at this blog

2.5.5 IB Switches prechecks

– To avoid issues with NFS/ZFS when rebooting the IB Switches (I got a lot in the past, not sure if it came from the client configuration but it is always unpleasant), I recommend copying the patch outside of any NFS/ZFS
– This patch is ~ 2.5 GB so be careful not to fill / if you copy it into /tmp, if not, choose another local FS

[root@myclusterdb01 ~]# du -sh /tmp/IB_PATCHING
[root@myclusterdb01 ~]# rm -fr /tmp/IB_PATCHING
[root@myclusterdb01 ~]# mkdir /tmp/IB_PATCHING
[root@myclusterdb01 ~]# unzip -q /Oct2018_Bundle/28689205/Infrastructure/*_Linux-x86-64.zip -d /tmp/IB_PATCHING
[root@myclusterdb01 ~]# cd /tmp/IB_PATCHING/patch_18.
[root@myclusterdb01 ~]# ./patchmgr -ibswitches ~/ib_group -ibswitch_precheck -upgrade

Note: despite what patchmgr documentation says, you have to specify an ib_group configuration file containing the list of your IB Switches
If the pre requisites show some conflicts to be resolved, please have a look at this blog where I explain how to manage the OS dependencies issues but do NOT use the -modify_at_prereq option straight away.

2.5.6 ROCE Switches prechecks

– As explained here, we will use an OS user admin to be able to run the pre-requisites (this is fixed from July 2020 and can be run as root from July 2020):

 [admin@x8m_01]$ pwd
[admin@x8m_01]$ ./patchmgr --roceswitches ~/roce_group --upgrade --roceswitch-precheck --log_dir /tmp

2.5.7 Grid Infrastructure prechecks

To start with, be sure that the patch has been unzipped (as the GI owner user to avoid any further permission issue):

[grid@myclusterdb01 ~]$ cd /Oct2018_Bundle/28689205/Database/
[grid@myclusterdb01 ~]$ unzip -q p28714316*_Linux-x86-64.zip

— This should create a 27968010 directory.
And start the prerequisites on each node:

[root@myclusterdb01 ~]# . oraenv <<< `grep "^+ASM" /etc/oratab | awk -F ":" '{print $1}'`
[root@myclusterdb01 ~]# cd /Oct2018_Bundle/28689205/Database/
[root@myclusterdb01 ~]# /u01/app/ apply -oh /u01/app/ -analyze

Alternatively, you can start the GI prerequisites on all nodes in parallel in one command

[root@myclusterdb01 ~]# dcli -g ~/dbs_group -l root "cd /Oct2018_Bundle/28689205/Database/; /u01/app/ apply -oh /u01/app/ -analyze"

Note: You will most likely see some warnings here, check the logfiles and they will probably be due to some patches that will be rolled back as they will not be useful any more.

Now that everything is downloaded, unzipped, updated, and that every prerequisite is successful, we can safely jump to the patching procedure in part 2!

Thank you for visiting this blog 🙂