Guenadi N Jilevski's Oracle BLOG

Oracle RAC, DG, EBS, DR and HA DBA BLOG

Cache fusion impact on Oracle 11g RAC performance – statistics and wait events

Cache fusion impact on Oracle 11g RAC performance – statistics and wait events

Basically CR block request time and current block request time are what we are looking at.

CR block request time is the time it takes to build the CR block  in an  instance that owns the appropriate image and the time to flush it, we have to write to  disk , and  how long it takes to send it across the interconnect.

Current block request time is how long it takes to pin the image in an instance that owns the block image and time it takes to flush it and send it across, because we cannot send it while some is changing that block at the same time. That is why we need to pin the block in exclusive mode then flush it and send it over the interconnect.

The statistics come from v$sysstat. Always query v$sysstat for the statistics or gv$sysstat

Other latencies comes from v$ges_statistics or GV$ges_statistics view.

What we are primarily concerned are the average time to process CR block and the average time to process current block. Those shown values are typical. If overtime those times start to grow it might mean that we need to explore why it is taking longer. We might need to look at the wait events and the possible causes for those latencies to be growing. We need to determine why the things are changing and getting worst over time.

Wait events for RAC are very interesting architecturally in that like any other wait events shows you all various things a session can wait on helping you identify what problem can be. RAC introduces an area that we do not need in a single instance environment. Lets recount  v$session_wait view. Oracle includes some common columns in v$session and v$session_wait views. The interesting columns are wait_time and event containing the name of the event in both view.  If a session is waiting for something then when you query v$session_wait the event column would contain the name of event what a session is waiting on for example db sequential read or log file parallel write occur in log writer (LGWR) as part of normal activity of copying records from the redo log buffer to the current online log or log file sync log when you commit also referred as a commit latency.  If wait_time is  0 event shows  what is waiting. If wait_time  is greater than 0 how long last event waited. If wait_time is  -2 init parameter timed_statistics is not set. If wait_time is -1  wait_time less than a hundred  of a second and wait event is not captured. For single instance, situation is simple, row in the view represents either currently waiting 0 or something waited. RAC introduces complexity. When cache fusion is being done server process cannot do I/O as it prefers. A single instance server process do I/O as  wants if a buffer is not in the buffer cache wait for example db sequential read  and when completes continue. In RAC server process makes a request to LMS background process handling cache fusion and when LMS gets involved there are several possibilities one is that the  instance requesting I/O have a valid copy of the block image in its own buffer cache and have enough information for the metadata part of the metadata global resource directory GRD  and everything can be done locally without a block transfer , another scenario is when the requesting instance A does not have the metadata and another instance B have the GRD metadata for example block m in file n and to get the global resource metadata will require a hop and will get to instance B in order to obtain  GRD metadata to identify the instance that have a valid copy of the block and if the block is either in instance A or B there are 2 hops as we already have 2 nodes involved. Worst possible scenarios irrelevant to how many instances we have, assuming we have more than two instances, is when the instance that makes the request does not have the image copy of the block neither the global resource directory metadata for the block in this case the LMS talk to LMS having the metadata who talks to LMS on a third instance that have the block image and the third instance using user mode IPC sends the block image to the first instance A requesting the block image. In the latter scenario we have a three hop situation. Three hop situation is the worst possible situation regardless of the number of nodes. To summarize we have a requesting instance where the initial request is made for a block image by the server process, we have the instance that serves the image called the owning or serving instance and we have the instance that own the metadata in GRD for the particular block number and file number that is referred to as a mastering instance. The worst situation is when the owning, master and requesting instances are separate instances. The best case is when they are in the same instance. We will see how this affect wait events.  All wait events related to the global cache are then collected in the cluster wait class in V$ or EM. Wait events for RAC help you analyze what sessions are waiting for. Wait times are attributed to events that reflects the outcome of a request. Global cache waits are summarized in a broader category called cluster wait class. These events are used in ADDM or V$ views to enable cache fusion diagnostics.

Let’s  look at the wait event views as a refresher for people that have not done it for a while.

V$SYSTEM_EVENT – total waits for an event

V$SESSION_WAIT_CLASS – waits for a wait event class by a session

V$SESSION_EVENT – waits for an event by a session

V$ACTIVE_SESSION_HISTORY – activity of recent active sessions

V$SESSION_WAIT_HISTORY – last 10 wait events for each active session.

V$SESSION_WAIT – events for which active sessions are waiting

V$SQLSTATS – identify SQL statements impacted by interconnect latencies

December 13, 2009 - Posted by | oracle

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: