Guenadi N Jilevski's Oracle BLOG

Oracle RAC, DG, EBS, DR and HA DBA BLOG

OUI in Oracle 11g in silent mode

OUI in Oracle 11g in silent mode

Oracle universal installer (OUI) is invoked by running the runInstaller utility. In Oracle 11g R1 OUI is used for both Oracle clusterware and RDBMS software installation and deinstallation. With OUI we can perform both interactive and silent install of the Oracle clusterware and RDBMS software.  When OUI runs in interactive mode it prompts you to provide information in graphical user interface (GUI) screens. When OUI runs in silent mode it uses a response files to provide this information. If you include responses for all of the prompts in the response file and specify the -silent option when starting OUI it runs in silent mode. During a silent-mode installation, OUI does not display any screens. Instead, it displays progress information in the terminal that you used to start it.

The patern of the response file for Oracle clusterware is /oracle_media/clusterware/response/crs.rsp. The files need to be edited as per the instructions inside to provide the information. We will rename the response file to crs1.rsp.

We initiate a silent install with the command as follows.

/oracle_media/clusterware/runInstaller –silent /oracle_media/clusterware/response/crs1.rsp

In Oracle 11g R1 OUI is used for both Oracle clusterware and RDBMS software installation and deinstallation. With OUI we can perform both interactive and silent install of the Oracle clusterware and RDBMS software and database. When OUI runs in interactive mode it prompts you to provide information in graphical user interface (GUI) screens. When OUI runs in silent mode it uses a response files to provide this information. If you include responses for all of the prompts in the response file and specify the -silent option when starting OUI it runs in silent mode. During a silent-mode installation, OUI does not display any screens. Instead, it displays progress information in the terminal that you used to start it. OUI provides ability for an enterprise, standard and custom installation. We will use an enterprise installation although with custom we can have more freedom to chose components we want installed. In the Select Configuration Option screen of the OUI we can select a database creation, ASM or software only. We can select to configure ASM option to have the RDBMS installed along with a clustered ASM instance. If we select an option for a database creation we will have a database created, ASM configured if ASM is chosen as a storage and RDBMS software installed. Using the above options for DB or ASM creation installs the software in one Oracle home.

We might want to have two Oracle homes. One Oracle home is for the ASM instance and the second is for the RDBMS database instance. The two Oracle homes installation provides flexibility to patch independently ASM oracle home from the RDBMS database home. We can specify install only software option from the OUI Select Configuration Option screen and create two Oracle homes for ASM and RDBMs database. Choosing the install software option provides with more flexibility and we can additionally use dbca utility to create the ASM and RAC RDBMS databases.

The pattern of the response file for Oracle RDBMS software is /oracle_media/database/install/response/ee.rsp. The files need to be edited as per the instructions inside to provide the information for Oracle RDBMS install and database creation. We will rename the response file to ee1.rsp.

We initiate a silent install by logging into raclinux1 as the oracle user and starting the OUI.

/oracle_media/clusterware/runInstaller –silent ee1.rsp

December 13, 2009 Posted by | oracle | 3 Comments

Hardware solution for Oracle RAC 11g private interconnect aggregating

Hardware solution for Oracle RAC 11g private interconnect aggregating

For the interconnect to be optimal, it must be private with high bandwidth and low latency. Bandwidth requirements depend on several factors such as the number of CPU, CPU speed per node, number of nodes, type of workload OLTP or DSS etc. This is achieved using technical hardware solutions such as 1Gbps Ethernet, 10Gbps Ethernet or Infiniband and UDP or RDS transport protocols.   Interconnect redundancy is recommended for high availability, increased bandwidth and decreased latency. There are multiple vendor specific technical network hardware solutions providing higher bandwidth achieved by aggregation, load balancing, load spreading and failover of the interconnects. Let’s look at the different solutions and topologies for aggregation aimed  at achieving a higher throughput, bandwidth, higher availability and lower latencies as specified below.
·    Sun aggregation and mulipathing.
·    HP auto port aggregation.
·    IBM AIX etherchannel.
·    Cisco etherchannel.
·    Linux bonding.
·    Windows NIC teaming.

Sun aggregation and multipathing

Link aggregations consist of groups of Network Interface Cards (NICs) that provide increased bandwidth, higher availability and fault tolerance. Network traffic is distributed among the members of an aggregation, and the failure of a single NIC should not affect the availability of the aggregation as long as there are other functional NICs in the same group.
IP Multipathing (IPMP) provides features such as higher availability at the IP layer. Both IPMP and Link Aggregation are based on the grouping of network interfaces, and some of their features overlap, such as higher availability. These technologies are however implemented at different layers of the stack, and have different strengths and weaknesses. Link aggregations once created by dladm, behave like any other physical NIC to the rest of the system. The grouping of interfaces for IPMP is done using ifconfig. Link aggregations currently don’t allow you to have separate standby interfaces that are not used until a failure is detected. If a link is part of an aggregation, it will be used to send and receive traffic if it is healthy. Link aggregations are implemented at the MAC layer and require all the constituent interfaces of an aggregation to use the same MAC address. Since IPMP is implemented at the network layer, it doesn’t have that limitation. Link aggregations provide finer grain control on the load balancing desired for spreading outbound traffic on aggregated links. E.g. load balance on transport protocol port numbers vs. MAC addresses. Dladm  allows the inbound and outbound distribution of traffic over the constituent NICs to be easily observed. It’s also worth pointing out that IPMP can be deployed on top of aggregation to maximize performance and availability.

HP Auto Port Aggregation

Hewlett-Packard’s Auto Port Aggregation (APA) increases a server’s efficiency by grouping or “aggregating” multiple ports into a single link aggregate or fail-over group having a single IP address. Up to fifty aggregates per computer are permitted on HP-UX versions 11iv1 and 11iv2.
·    Load balancing – The server traffic load is distributed over each member of the link aggregate so that each individual link is used. No links are wasted as they would be under a “hot standby” mode of operation. HP Auto-Port Aggregation’s load balancing also attempts to maximize throughput over the links.
·    High throughput – Four 100Base-T links mean four x 100Mbps (400Mbps) in each direction or 800Mbps in both directions. A single HP Auto-Port Aggregation trunk containing four 1000Base-T links can handle eight Gigabits per second! This high throughput level is especially useful for bandwidth intensive applications.
·    Single IP address capability – HP Auto-Port Aggregation provides high throughput in multiples of 1000/100Mbps using a single IP address. It enables customers to transparently increase overall bandwidth without reconfiguring servers to add additional IP addresses and with no IP routing table modifications or adjustments to other network parameters. Single IP address capability means bandwidth growth without the work of modifying thousands of IP addresses and with no need to reconfigure sensitive parameters inside the network.

AIX etherchannel

EtherChannel and IEEE 802.3ad Link Aggregation are network port aggregation technologies that allow several Ethernet adapters to be aggregated together to form a single pseudo Ethernet device. For example, ent0 and ent1 can be aggregated into an EtherChannel adapter called ent3; interface ent3 would then be configured with an IP address. The system considers these aggregated adapters as one adapter. Therefore, IP is configured over them as over any Ethernet adapter.
All adapters in the EtherChannel or Link Aggregation are given the same hardware (MAC) address, so they are treated by remote systems as if they were one adapter. Both EtherChannel and IEEE 802.3ad Link Aggregation require support in the switch so these two technologies are aware which switch ports should be treated as one.
The main benefit of EtherChannel and IEEE 802.3ad Link Aggregation is that they have the network bandwidth of all of their adapters in a single network presence. If an adapter fails, network traffic is automatically sent on the next available adapter without disruption to existing user connections. The adapter is automatically returned to service on the EtherChannel or Link Aggregation when it recovers. EtherChannel satisfies the large bandwidth RAC interconnect requirement. An EtherChannel with multiple links is extremely beneficial for a RAC cluster.  Although multiple gigabit networks can be connected between nodes without using EtherChannel as an interconnect, only one network link can be used by RAC as the private interconnect at any one time. The remaining network links can only be used as standbys for failover purposes. That is, as long as the primary RAC interconnects is still alive, the rest of the backup networks will never be used. The EtherChannel round robin algorithm aggregates all the link bandwidths and satisfies the large bandwidth requirement problem. RAC can use EtherChannel if specified under cluster_interconnects. If the IP address of the EtherChannel is specified as the setting of init.ora cluster_interconnects for each instance RAC will use it. An EtherChannel configured with multiple links has built-in high availability. As long as there is one link available, the Ethernet will continue to function. If we have a 2-link EtherChannel  when we disconnect  one of the gigabit links under cache fusion traffic, EtherChannel stays up and so did the RAC instances. The link failure is only reported in the AIX error report errpt. EtherChannel with more than 2 links may provide better availability. An EtherChannel with multiple links using a round robin algorithm and aggregate network bandwidth provides better network performance and availability than the two private interconnects and one public network scheme.

Cisco EtherChannel Benefits

Cisco EtherChannel technology provides a solution for enterprises requiring higher bandwidth and low latency between servers, routers, and switches than a single-link Ethernet technology can provide. Cisco EtherChannel technology provides incremental scalable bandwidth and the following benefits:
·    Standards based—Cisco EtherChannel technology builds upon IEEE 802.3-compliant Ethernet by grouping multiple, full-duplex point-to-point links together. EtherChannel technology uses IEEE 802.3 mechanisms for full-duplex autonegotiation and autosensing, when applicable.
·    Multiple platforms—Cisco EtherChannel technology is flexible and can be used anywhere in the network that bottlenecks are likely to occur. It can be used in network designs to increase bandwidth between switches and between routers and switches—as well as providing scalable bandwidth for network servers, such as large UNIX servers or PC-based database servers.
·    Flexible incremental bandwidth—Cisco EtherChannel technology provides bandwidth aggregation in multiples of 100 Mbps, 1 Gbps, or 10 Gbps, depending on the speed of the aggregated links. For example, one can deploy EtherChannel technology that consists of pairs of full-duplex Fast Ethernet links to provide more than 400 Mbps bandwidth. Bandwidths of up to 800 Mbps can be provided between servers and the network backbone to provide large amounts of scalable incremental bandwidth.
·    Load balancing—Cisco EtherChannel technology is composed of several Fast Ethernet links and is capable of load balancing traffic across those links. Unicast, broadcast, and multicast traffic is evenly distributed across the links, providing higher performance and redundant parallel paths. When a link fails, traffic is redirected to the remaining links within the channel without user intervention and with minimal packet loss.
·    Resiliency and fast convergence—When a link fails, Cisco EtherChannel technology provides automatic recovery by redistributing the load across the remaining links. When a link fails, Cisco EtherChannel technology redirects traffic from the failed link to the remaining links in less than one second. This convergence is transparent to the end user—no host protocol timers expire, so no sessions are dropped.
·    Transparent to network applications—Cisco EtherChannel technology does not require changes to networked applications. When EtherChannel technology is used within the campus, switches and routers provide load balancing across multiple links transparently to network users. To support EtherChannel technology on enterprise-class servers and network interface cards, smart software drivers can coordinate distribution of loads across multiple network interfaces.
·    100 Megabit, 1 Gigabit, and 10 Gigabit Ethernet-ready—Cisco EtherChannel technology is available in all Ethernet link speeds. EtherChannel technology allows network managers to deploy networks that will scale smoothly with the availability of next-generation, standards-based Ethernet link speeds.

Linux IP Bonding

Linux IP Bonding can be used to create a virtual NIC that runs over multiple physical NICs. Packets sent out over the virtual NIC can then be load balanced across the physical NICs. This should increase performance as we can now theoretically double or triple the available bandwidth.  More information on  IP bonding can be found at:
http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/s1-modules-ethernet.html
The Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical bonded interface. The behavior of the bonded interfaces depends on the mode; generally speaking, modes provide either hot standby or load balancing services.

Windows NIC teaming

Windows allows aggregation using third party products know as NIC teaming.

December 13, 2009 Posted by | oracle | 1 Comment

Most common Oracle 11g RAC tuning tactics

Most common Oracle 11g  RAC tuning tactics

Application tuning and modifying schema design is often the most beneficial although can be more expensive solution as well.

Resizing and tuning the buffer cache.

Reducing full table scans in OLTP systems while doing updates and separate batch from OLTP at the same time. In RAC when we do a full table scan the instance doing them needs to get every single block and get the right version of the block images and if those blocks are being modified in other instances there is a lot of undo that have to be built. It is recommended if possible that operationally OLTP and DSS to be scheduled so that they do not coincide to improve performance.

Using Automatic Segment Space Management. Helps for RAC high insert intensive applications avoiding issues previously encountered and associated with free lists and free list groups and older way which Oracle used to manage space in segments.

Increasing sequence caching in RAC. It is good to cache and to cache with noorder clause.

Using partitioning to reduce inter-instance traffic. Partitioning using services allow us to avoid a lot of cache fusion overhead by having partitions accessed by users connected to some instances and other partitions accessed by users connected to other instances. It cannot be guaranteed but helps under certain circumstances.

Avoiding unnecessary parsing see above how to avoid parse overhead with application design and cursor sharing.

Minimizing locking usage. Contention is going to be exacerbated in RAC because apart from the contention available in a single instance database in RAC we have also an interconnect overhead so minimizing the locking overhead is very important in RAC environments. Typically this occurs with third party applications that does a lot of locking overhead due to the fact that their code is the same so that they can work on many database platforms.  This is worth investigating further.

Removing unselective indexes in OLTP as they are not used by CBO for access method but while executing DML statements they need to be maintained. That is an overhead generally speaking but in RAC we might end up having those index blocks transferred by cache fusion. Removing unselective indexes will improve performance. Index splits is another important thing that we must be aware and look at.

Configure interconnect properly with high bandwidth and low latency.

December 13, 2009 Posted by | oracle | Leave a comment

Common Problems and symptoms – Wait events worth investigation

Common Problems and symptoms –  Wait events worth investigation

Let’s look at some of the wait events which are worth further investigation as they represent a potential performance problem if the wait time is excessive or the event wait time is among the top 5 list in AWR report.

global cache blocks lost: This statistic shows block losses during transfers. High values indicate network problems. The use of an unreliable IPC protocol such as UDP may result in the value for global cache blocks lost being non-zero. When this occurs, take the ratio of global cache blocks lost divided by global cache current blocks served plus global cache cr blocks served. This ratio should be as small as possible. Many times, a non-zero value for global cache blocks lost does not indicate a problem because Oracle will retry the block transfer operation until it is successful.

global cache blocks corrupt: This statistic shows if any blocks were corrupted during transfers. If high values are returned for this statistic, there is probably an IPC, network or hardware problem.

global cache open s and global cache open x: The initial access of a particular data block by an instance generates these events. The duration of the wait should be short, and the completion of the wait is most likely followed by a read from disk. This wait is a result of the blocks that are being requested and not being cached in any instance in the cluster database. This necessitates a disk read. When these events are associated with high totals or high per-transaction wait times, it is likely that data blocks are not cached in the local instance and that the blocks cannot be obtained from another instance, which results in a disk read. At the same time, suboptimal buffer cache hit ratios may also be observed. Unfortunately, other than preloading heavily used tables into the buffer caches, there is little that can be done about this type of wait event.

global cache null to s and global cache null to x: These events are generated by inter-instance block ping across the network. Interinstance block ping is when two instances exchange the same block back and forth. Processes waiting for global cache null to s events are waiting for a block to be transferred from the instance that last changed it. When one instance repeatedly requests cached data blocks from the other RAC instances, these events consume a greater proportion of the total wait time. The only method for reducing these events is to reduce the number of rows per block to eliminate the need for block swapping between two instances in the RAC cluster.

global cache cr request: This event is generated when an instance has requested a consistent read data block and the block to be transferred had not arrived at the requesting instance. Other than examining the cluster interconnects for possible problems, there is nothing that can be done about this event other than to modify objects to reduce the possibility of contention.

gc cr block lost – This event almost always represents a severe performance problem and can reveal network congestion involving discarded packets and fragments, packet reassembly or timeouts, buffers overflows, flow control. Checksum errors or corrupted headers are also often the reason for the wait event. It is worth investigating the IPC configuration and possible downstream network problems (NIC, switch etc). Operating system data needs to be gathered with ifconfig, netstat and sar to name a few. ‘cr request retry’ event is likely to be seen when ‘gc cr blocks lost’ show up.

gc buffer busy: This event can be associated with a disk I/O contention for example slow disk I/O due to rogue query. Slow concurrent scans can cause buffer cache contention. However, note than there can be a multiple symptoms for the same cause. It can be seen together with ‘db file scattered reads’ event.  Global cache access and serialization attributes to this event. Serialization is likely to be due to log flush time on another node or immediate block transfers.

congested: The events that contain  ‘congested’ suggest CPU saturation (runaway or spinning processes), long running queues, network configuration issues. It indicates performance problems. While investigating need to maintain a global view and remember that symptom and cause can be on different instances. This event can also happen if LSM cannot dequeue messages fast enough. gcs_server_processes init parameter controls number of LMS processes although in most of the cases the default value is sufficient.  Excessive memory consumption leading to memory swapping can be another reason.

busy:  The events that contain ‘busy’ indicate contention. It needs investigation by drilling down into either SQL with highest cluster wait time or segment statistics with highest block transfers. Also look at objects with highest number of block transfers and global serialization.

Gc [current/cr] [2/3]-way – If  we have 2 node cluster  we cannot get 3-way as only two RAC instances are available and therefore only 2-way   is possible as we can have at most two hops. If we have three or more RAC instances then 2-way or 3-way is possible. Event are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency.

Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency.

Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of  intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. We will look at it later.

Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating and we will look at it later.

Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example we are extending the high water mark and we are formatting the block images or blocks with block headers.

Gc [current/cr][failure/retry] –  Not received because of failure, checksum error usually in the protocol of the  private interconnect  due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problem recovers and ultimately the block image can be received but it needs to retry.

Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention.

Perform a top down approach for performance analysis can be helpful. We can start with ADDM analysis then continue with AWR detail statistics and historical data and last but not least ASH will provide you with finer-grained session specific data.

December 13, 2009 Posted by | oracle | Leave a comment

Oracle 11g RAC private interconnect considerations

Oracle 11g RAC private interconnect considerations

Full bit rate

Ensure that the network interface card (NIC) is configured at the maximum bandwidth that is 100Mbps, 1Gbps, 10Gbps.

Full duplex vs. half duplex

Half-Duplex Operation – Technologies that employ half-duplex operation are capable of sending information in both directions between two nodes, but only one direction or the other can be utilized at a time. This is a fairly common mode of operation when there is only a single network medium (cable, radio frequency and so forth) between devices. While this term is often used to describe the behavior of a pair of devices, it can more generally refer to any number of connected devices that take turns transmitting. For example, in conventional Ethernet networks, any device can transmit, but only one may do so at a time. For this reason, regular (unswitched) Ethernet networks are often said to be “half-duplex”, even though it may seem strange to describe a LAN that way.

Full-Duplex Operation – In full-duplex operation, a connection between two devices is capable of sending data in both directions simultaneously. Full-duplex channels can be constructed either as a pair of simplex links (as described above) or using one channel designed to permit bidirectional simultaneous transmissions. A full-duplex link can only connect two devices, so many such links are required if multiple devices are to be connected together. Note that the term “full-duplex” is somewhat redundant; “duplex” would suffice, but everyone still says “full-duplex” (likely, to differentiate this mode from half-duplex).

Flow Control

In computer networking, flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from outrunning a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with data from tranceiving nodes. Flow control should be distinguished from congestion control, which is used for controlling the flow of data when congestion has actually occurred. Flow control mechanisms can be classified by whether or not the receiving node sends feedback to the sending node. Flow control is important because it is possible for a sending computer to transmit information at a faster rate than the destination computer can receive and process them. This can happen if the receiving computers have a heavy traffic load in comparison to the sending computer, or if the receiving computer has less processing power than the sending computer. Oracle recommends that for the flow control Rx=on tx=off

/sbin/ethtool -A eth0 autoneg on  tx off rx on

December 13, 2009 Posted by | oracle | Leave a comment

Overview of global enqueue waits

Overview of global enqueue waits

In a RAC database, the GES is responsible for inter-instance resource coordination. The GES manages all cache fusion inter-instance resource optimization. It tracks the status of all Oracle enqueue mechanisms for resources that are accessed by more than one instance. Oracle uses GES to manage concurrency for resources on transactions, tables and other structures within a RAC environment. GES is an integrated RAC component that coordinates global locks between the instances in the cluster.  Block access and update are recorded in GRD, which is a virtual memory structure spanning across all instances. GES controls all library cache locks and dictionary cache locks in the database.  These resources are local in a single instance database and global in a RAC database. Global locks are also used to protect data structures used for a transaction management.  GES performs most of its activities using the LMD0 and LCK0 background processes. In general processes communicate with their local LMD0 process to manipulate the global resources. The local LMD0 process communicates with LMD0 process on other instances. The LCK0 background process is used to obtain locks that are required by the entire instance. For example LCK0 is responsible for maintaining dictionary cache locks.

A resource is a memory structure that represents some component of the database to which access must be restricted or serialized. In other words, the resource can only be accessed by one process or one instance at a time. If the resources is currently in use, other processes or instances needing to access the resource must wait in a queue until the resource becomes available.

An enqueue is a memory structure that serializes access to a particular resource. If the resource is only required by the local instance, then the enque can be required locally, and no coordination is necessary. However, if the resource is required by a remote instance, then the local enque must become global.

Let’s make an overview of the global enqueue waits in RAC.  Enqueue in RAC are global in order to have coherency for enqueues across all the instances and they are synchronous.

Most common enqueue are listed below

TX – transaction enqueue representing either row lock waits or ITL waits

TM – table manipulation enqueue on any table with DML. For example update block in one instance and truncate, drop or collect statistics  in another instance on the same table.

HW – high watermark enqueue when there is extending of HW on a segment

TA – transaction recovery enqueue

SQ – sequence generation enqueue

US – undo segment enqueue for managing undo segments extension

The waits may constitute serious serialization points because enqueue are often even in a single instance and in RAC in addition to serialization that occurs due to enqueus they are exacerbated due to the private interconnect latency.  If something does not scale well in a single instance it will not scale well in RAC due to contention problems and additional overhead contributed by the private interconnect latencies.

December 13, 2009 Posted by | oracle | Leave a comment

Overview of the global cache wait event overview

Overview of the global cache wait event overview

When a server process makes a request for a block image it has no way to know if it going to be satisfied  by 1 or 2 or 3 nodes. All it does is contact local LMS process and requests a particular block image. This can be current or cr request and can be either ordinary or multiblock request so to summarize the possibilities are as follows:

Gc current request

Gc current multiblock request

Gc cr request

Gc cr multiblock request

The placeholder holds those events above whilst the process is waiting to get this block image. Once the wait is over we will know what has happened and the wait_time will be non zero value and we will have the actual event the process have been waiting in the event column of the v$session_wait instead of the placeholder event and the wait time for the actual event in the wait_time column in the v$session_wait.  Let’s look at some of the events that are relevant.

Gc [current/cr] [2/3]-way – If  we have 2 node cluster  we cannot get 3-way as only two RAC instances are available and therefore only 2-way   is possible as we can have at most two hops. If we have three or more RAC instances then 2-way or 3-way is possible. Event are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency.

Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while gc cr grant is read the block from disk and build a read consistent block once is read.

Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of  intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. We will look at it later.

Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating and we will look at it later.

Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example we are extending the high water mark and we are formatting the block images or blocks with block headers.

Gc [current/cr][failure/retry] –  Not received because of failure, checksum error usually in the protocol of the  private interconnect  due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that  the problems recovers and ultimately the block image can be received but it needs to retry.

Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as  a request is made from one instance and the block is available in another instance and the block is busy due to contention.

The key to remember is that there are separate wait events for the placeholder and when the event is over this event is replaced in v$session_wait with different event depending on how many hops there were, what kind of request was, what happened, was there a congestion, busy, failure or  retry.  Looking at v$ views or AWR reports we need to see if we observe congestion, busy, failure, retry and further investigate.

December 13, 2009 Posted by | oracle | 2 Comments

Cache fusion impact on Oracle 11g RAC performance – statistics and wait events

Cache fusion impact on Oracle 11g RAC performance – statistics and wait events

Basically CR block request time and current block request time are what we are looking at.

CR block request time is the time it takes to build the CR block  in an  instance that owns the appropriate image and the time to flush it, we have to write to  disk , and  how long it takes to send it across the interconnect.

Current block request time is how long it takes to pin the image in an instance that owns the block image and time it takes to flush it and send it across, because we cannot send it while some is changing that block at the same time. That is why we need to pin the block in exclusive mode then flush it and send it over the interconnect.

The statistics come from v$sysstat. Always query v$sysstat for the statistics or gv$sysstat

Other latencies comes from v$ges_statistics or GV$ges_statistics view.

What we are primarily concerned are the average time to process CR block and the average time to process current block. Those shown values are typical. If overtime those times start to grow it might mean that we need to explore why it is taking longer. We might need to look at the wait events and the possible causes for those latencies to be growing. We need to determine why the things are changing and getting worst over time.

Wait events for RAC are very interesting architecturally in that like any other wait events shows you all various things a session can wait on helping you identify what problem can be. RAC introduces an area that we do not need in a single instance environment. Lets recount  v$session_wait view. Oracle includes some common columns in v$session and v$session_wait views. The interesting columns are wait_time and event containing the name of the event in both view.  If a session is waiting for something then when you query v$session_wait the event column would contain the name of event what a session is waiting on for example db sequential read or log file parallel write occur in log writer (LGWR) as part of normal activity of copying records from the redo log buffer to the current online log or log file sync log when you commit also referred as a commit latency.  If wait_time is  0 event shows  what is waiting. If wait_time  is greater than 0 how long last event waited. If wait_time is  -2 init parameter timed_statistics is not set. If wait_time is -1  wait_time less than a hundred  of a second and wait event is not captured. For single instance, situation is simple, row in the view represents either currently waiting 0 or something waited. RAC introduces complexity. When cache fusion is being done server process cannot do I/O as it prefers. A single instance server process do I/O as  wants if a buffer is not in the buffer cache wait for example db sequential read  and when completes continue. In RAC server process makes a request to LMS background process handling cache fusion and when LMS gets involved there are several possibilities one is that the  instance requesting I/O have a valid copy of the block image in its own buffer cache and have enough information for the metadata part of the metadata global resource directory GRD  and everything can be done locally without a block transfer , another scenario is when the requesting instance A does not have the metadata and another instance B have the GRD metadata for example block m in file n and to get the global resource metadata will require a hop and will get to instance B in order to obtain  GRD metadata to identify the instance that have a valid copy of the block and if the block is either in instance A or B there are 2 hops as we already have 2 nodes involved. Worst possible scenarios irrelevant to how many instances we have, assuming we have more than two instances, is when the instance that makes the request does not have the image copy of the block neither the global resource directory metadata for the block in this case the LMS talk to LMS having the metadata who talks to LMS on a third instance that have the block image and the third instance using user mode IPC sends the block image to the first instance A requesting the block image. In the latter scenario we have a three hop situation. Three hop situation is the worst possible situation regardless of the number of nodes. To summarize we have a requesting instance where the initial request is made for a block image by the server process, we have the instance that serves the image called the owning or serving instance and we have the instance that own the metadata in GRD for the particular block number and file number that is referred to as a mastering instance. The worst situation is when the owning, master and requesting instances are separate instances. The best case is when they are in the same instance. We will see how this affect wait events.  All wait events related to the global cache are then collected in the cluster wait class in V$ or EM. Wait events for RAC help you analyze what sessions are waiting for. Wait times are attributed to events that reflects the outcome of a request. Global cache waits are summarized in a broader category called cluster wait class. These events are used in ADDM or V$ views to enable cache fusion diagnostics.

Let’s  look at the wait event views as a refresher for people that have not done it for a while.

V$SYSTEM_EVENT – total waits for an event

V$SESSION_WAIT_CLASS – waits for a wait event class by a session

V$SESSION_EVENT – waits for an event by a session

V$ACTIVE_SESSION_HISTORY – activity of recent active sessions

V$SESSION_WAIT_HISTORY – last 10 wait events for each active session.

V$SESSION_WAIT – events for which active sessions are waiting

V$SQLSTATS – identify SQL statements impacted by interconnect latencies

December 13, 2009 Posted by | oracle | Leave a comment

Cache fusion impact on Oracle 11g RAC performance

Cache fusion impact on Oracle 11g RAC performance

RAC has a new feature called the cache fusion in comparison to OPS. Reads from disk are involved only if block is not available in the buffer caches of the other instances. Obviously accessing cache fusion is at a cost despite being faster than OPS. Cache cohesion is making logical global cache out of all the caches belonging to all RAC instances. We can determine cache fusion performance by all wait events and statistics. The cost of block access and cache coherency is represented by Global cache services statistics and Global cache services wait events.  Response time respectively cost is usually a lot less than I/O to disk but of course there is a room to improve the RAC performance. If you start ramping up the work load you could end up with performance problems although performance is better than reading from disk as the case with OPS. For example you have 2 instances and you want performance not to be 10% better but you want to be 90% better. The overhead in response time for the cache fusion transfers  are accounted by

Physical private interconnects – More than one interconnect is required, the more interconnects  the more redundancies and higher bandwidth for messages and cache fusion block transfer are available. Achieving low latencies is the objective. Private interconnects are required. Public corporate LAN might have a high bandwidth but have a low latency due to high retransmissions on encountered collisions. Interconnect depends on the speed that is set and redundancy.

IPC protocol – Oracle RAC tries to use IPC user mode inter process communication for sending data   from one node to another as it  does not require context switch and does not require kernel mode and runs in user application program mode.  IPC protocol depends on the vendor of the hardware.

GCS protocol – GCS protocol which  depends on IPC protocol and private interconnect and is not directly affected by disk I/O  except of disk I/O  for the log write I/O whenever a dirty block in the buffer cache that is send over the interconnect to another instance for cache fusion reason either in write-read or write-write situation. For example a block is updated in transaction in buffer cache in instance A and the very same block image is transferred  via the cache fusion traffic  mechanism to buffer cache in  instance C. We have to guarantee that redo log is written to the redo log files first. Other than that there is no much disk I/O performed. For example if there are  1000th updates will result in doing much more redo than doing more reads. Heavy updates transactions will incur more disk I/O for the redo than doing more reads. The cache fusion response time is not generally affected by disk I/O factors except for the occasional log write done when sending a dirty buffer to another instance in write-read or write-write situation.

December 13, 2009 Posted by | oracle | Leave a comment

Oracle RAC 11g cache coherency

Oracle RAC 11g cache coherency

Let’s briefly review the concepts of RAC prior discussing the tuning issues. Each instance has a buffer cache in its System Global Area (SGA). Using Cache Fusion, Oracle RAC environments logically combine each instance’s buffer cache to enable the instances to process data as if the data resided on a logically combined, single cache. The SGA size requirements for Oracle RAC are greater than the SGA requirements for single-instance Oracle databases due to Cache Fusion. To ensure that each Oracle RAC database instance obtains the block that it requires to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances, which effectively increases the size of the SGA for an Oracle RAC instance. After one instance caches data, any other instance within the same cluster database can acquire a block image from another instance in the same database faster than by reading the block from disk. Therefore, Cache Fusion moves current blocks between instances rather than re-reading the blocks from disk. When a consistent block is needed or a changed block is required on another instance, Cache Fusion transfers the block image directly between the affected instances. Oracle RAC uses the private interconnect for interinstance communication and block transfers. The GES Monitor and the Instance Enqueue Process manages access to Cache Fusion resources and enqueue recovery processing. The GCS and GES processes, and the GRD collaborate to enable Cache Fusion.

December 13, 2009 Posted by | oracle | Leave a comment