Today one of the instances rebooted and the logs extract is below:
Thu May 07 12:12:51 2009
LMON (ospid: 761896) waits for event 'control file sequential read' for 83 secs.
Thu May 07 12:13:21 2009
LMON (ospid: 761896) waits for event 'control file sequential read' for 113 secs.
ERROR: LMON is not healthy and has no heartbeat.
ERROR: LMD0 (ospid: 405742) is terminating the instance.
LMD0 (ospid: 405742): terminating the instance due to error 482
I don't understand why it rebooted, though the logs above seem to suggest the instance was unable to read the controlfile.
My question is why didnt the whole node reboot?
So, what happened is: the DB checks ("pings") the control files and expects an answer from the control files within a certain amount of time. If the control file cannot be accessed within this time, then the instances assume a problem in the communication with the the storage / control file and evict the instance in order to prevent data corruption.
No comments:
Post a Comment