Cache fusion impact on Oracle 11g RAC performance – statistics and wait events
Cache fusion impact on Oracle 11g RAC performance – statistics and wait events
Basically CR block request time and current block request time are what we are looking at.
CR block request time is the time it takes to build the CR block in an instance that owns the appropriate image and the time to flush it, we have to write to disk , and how long it takes to send it across the interconnect.
Current block request time is how long it takes to pin the image in an instance that owns the block image and time it takes to flush it and send it across, because we cannot send it while some is changing that block at the same time. That is why we need to pin the block in exclusive mode then flush it and send it over the interconnect.
The statistics come from v$sysstat. Always query v$sysstat for the statistics or gv$sysstat
Other latencies comes from v$ges_statistics or GV$ges_statistics view.
What we are primarily concerned are the average time to process CR block and the average time to process current block. Those shown values are typical. If overtime those times start to grow it might mean that we need to explore why it is taking longer. We might need to look at the wait events and the possible causes for those latencies to be growing. We need to determine why the things are changing and getting worst over time.
Wait events for RAC are very interesting architecturally in that like any other wait events shows you all various things a session can wait on helping you identify what problem can be. RAC introduces an area that we do not need in a single instance environment. Lets recount v$session_wait view. Oracle includes some common columns in v$session and v$session_wait views. The interesting columns are wait_time and event containing the name of the event in both view. If a session is waiting for something then when you query v$session_wait the event column would contain the name of event what a session is waiting on for example db sequential read or log file parallel write occur in log writer (LGWR) as part of normal activity of copying records from the redo log buffer to the current online log or log file sync log when you commit also referred as a commit latency. If wait_time is 0 event shows what is waiting. If wait_time is greater than 0 how long last event waited. If wait_time is -2 init parameter timed_statistics is not set. If wait_time is -1 wait_time less than a hundred of a second and wait event is not captured. For single instance, situation is simple, row in the view represents either currently waiting 0 or something waited. RAC introduces complexity. When cache fusion is being done server process cannot do I/O as it prefers. A single instance server process do I/O as wants if a buffer is not in the buffer cache wait for example db sequential read and when completes continue. In RAC server process makes a request to LMS background process handling cache fusion and when LMS gets involved there are several possibilities one is that the instance requesting I/O have a valid copy of the block image in its own buffer cache and have enough information for the metadata part of the metadata global resource directory GRD and everything can be done locally without a block transfer , another scenario is when the requesting instance A does not have the metadata and another instance B have the GRD metadata for example block m in file n and to get the global resource metadata will require a hop and will get to instance B in order to obtain GRD metadata to identify the instance that have a valid copy of the block and if the block is either in instance A or B there are 2 hops as we already have 2 nodes involved. Worst possible scenarios irrelevant to how many instances we have, assuming we have more than two instances, is when the instance that makes the request does not have the image copy of the block neither the global resource directory metadata for the block in this case the LMS talk to LMS having the metadata who talks to LMS on a third instance that have the block image and the third instance using user mode IPC sends the block image to the first instance A requesting the block image. In the latter scenario we have a three hop situation. Three hop situation is the worst possible situation regardless of the number of nodes. To summarize we have a requesting instance where the initial request is made for a block image by the server process, we have the instance that serves the image called the owning or serving instance and we have the instance that own the metadata in GRD for the particular block number and file number that is referred to as a mastering instance. The worst situation is when the owning, master and requesting instances are separate instances. The best case is when they are in the same instance. We will see how this affect wait events. All wait events related to the global cache are then collected in the cluster wait class in V$ or EM. Wait events for RAC help you analyze what sessions are waiting for. Wait times are attributed to events that reflects the outcome of a request. Global cache waits are summarized in a broader category called cluster wait class. These events are used in ADDM or V$ views to enable cache fusion diagnostics.
Let’s look at the wait event views as a refresher for people that have not done it for a while.
V$SYSTEM_EVENT – total waits for an event
V$SESSION_WAIT_CLASS – waits for a wait event class by a session
V$SESSION_EVENT – waits for an event by a session
V$ACTIVE_SESSION_HISTORY – activity of recent active sessions
V$SESSION_WAIT_HISTORY – last 10 wait events for each active session.
V$SESSION_WAIT – events for which active sessions are waiting
V$SQLSTATS – identify SQL statements impacted by interconnect latencies
No comments yet.
-
Archives
- February 2017 (1)
- November 2016 (1)
- October 2016 (1)
- May 2016 (2)
- March 2016 (3)
- December 2014 (2)
- July 2014 (1)
- June 2014 (6)
- May 2014 (5)
- February 2014 (1)
- December 2012 (2)
- November 2012 (8)
-
Categories
-
RSS
Entries RSS
Comments RSS
Leave a Reply