Node Reboot or Node Eviction Troubleshooting

Cluster integrity and cluster membership will be governed by occsd (oracle cluster synchronization daemon)
monitors the nodes using 2 communication channels:

- Private Interconnect  aka Network Heartbeat
- Voting Disk based communication aka  Disk Heartbeat

These are the dependent processes/agents (init ==> ohasd ==> cssdagent ==> ora.ocssd.bin)

If cssd found that ocssd is down, it will reboot the node to protect the data integrity.

but why ocssd down !!!!! any idea, please go through below chart where I tried to explained dependency
of ocssd process with two communication channel

first one is network heart beat which is communicate over private interconnect
second one is disk heart beat which is communicate over Voting disk

Why nodes should be evicted?

Evicting (fencing) nodes is a preventive measure (it’s a good thing)!

Nodes are evicted to prevent consequences of a split brain:
– Shared data must not be written by independently operating nodes
– The easiest way to prevent this is to forcibly remove a node from the cluster

Network heartbeat:-

Each node in the cluster is “pinged” every second

Nodes must respond in css_misscount time (defaults to 30 secs.)

              bash-3.2$ ./crsctl get css misscount
                     CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
    

– Reducing the css_misscount time is generally not supported

Network heartbeat failures will lead to node evictions
CSSD-log:
[date / time] [CSSD][1111902528]
clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal in 6.7 sec

http://www.dbas-oracle.com/2013/06/Top-4-Reasons-Node-Reboot-Node-Eviction-in-Real-Application-Cluster-RAC-Environment.html

http://oracle-info.com/2012/12/27/oracle-rac-node-evictions-11gr2-node-eviction-means-restart-of-cluster-stack-not-reboot-of-node/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s