Monday 9 December 2013

Solaris Cluster 4.x Basics

Cluster Components:-
Hardware components:-
1.Hosts
2.Cluster transport adaptors
3.Common storage
4.Hba cards
5.switches
6.cluster transport cables
7.Local disks.
8.Removable media.
Software Components:-
1.Operating system.
2.Cluster software
3.Data service applications.


Multi host Devices:
LUNs that can be connected to more than one cluster node at a time are multi host devices.
Local Disks:-
Local disks are the disks that are connected to single node.
Removable media:-
Removable media such as tape drives and CD-ROM drives are supported in a cluster.

Cluster Interconnect:-
The cluster interconnect is the physical configuration of devices that is used to transfer cluster-private communications and data service communications between cluster nodes in the cluster.
Adapters – The network interface cards that are located in each cluster node
Junctions – The switches that are located outside of the cluster nodes.
Cables – The physical connections that you install either between two network adapters or between an adapter and a junction.
Public Network Interfaces:-
Clients connect to the cluster through the public network interfaces.

Logging into cluster Remotely:-
We have console access to all cluster nodes in the cluster.
We can use the Parallel Console Access (pconsole) utility from the command line to log into the cluster remotely.
The pconsole utility is part of the Oracle Solaris terminal/pconsole package.
Install the package by executing pkg install terminal/pconsole.

Cluster Topologies:-
·         Clustered pair
·         Pair+N
·         N+1 (star)
·         N*N (scalable)
·         Oracle VM Server for SPARC Software guest domains: cluster in a box
·         Oracle VM Server for SPARC Software guest domains: single cluster spans two different physical cluster hosts (boxes)
·         Oracle VM Server for SPARC Software guest domains: clusters span two different hosts (boxes)
·         Oracle VM Server for SPARC Software guest domains: each guest domain is hosted by redundant I/O domains


Cluster Time:-
We  install Oracle Solaris Cluster software using the scinstall command, the software supplies template files (see /etc/inet/ntp.conf and /etc/inet/ntp.conf.sc on an installed cluster node) that establish a peer relationship between all cluster nodes.

Standard Oracle Solaris Cluster:-
Standard Oracle Solaris Cluster systems provide high availability and reliability from a single location.
Campus Clusters:-
Campus clusters enable you to locate cluster components, such as cluster nodes and shared storage, in separate rooms that are several kilometers apart.
High-Availability Framework
Data service
HA API, HA framework
Not applicable
Public network adapter
IP network multipathing
Multiple public network adapter cards
Cluster file system
Primary and secondary replicas
Multihost devices
Mirrored multihost device
Volume management (Solaris Volume Manager)
Hardware RAID-5

Global device
Primary and secondary replicas
Multiple paths to the device, cluster transport junctions
Private network
HA transport software
Multiple private hardware-independent networks
Node
CMM, fail fast driver
Multiple nodes
Zone
HA API, HA framework
Not applicable


Global Devices:-
The Oracle Solaris Cluster software uses global devices to provide cluster-wide, highly available access to any device in a cluster from any node.
 In general, if a node fails while providing access to a global device, the Oracle Solaris Cluster software automatically uses another path to the device.
The Oracle Solaris Cluster software then redirects the access to that path.
The only multi ported global devices that Oracle Solaris Cluster software supports are disks.
The cluster automatically assigns unique IDs to each device [disk ..] in the cluster.
This assignment enables consistent access to each device from any node in the cluster.
The global device namespace is held in the /dev/global directory.
Multiported global devices provide more than one path to a device.

DID Psudo Driver & Device ID:-
The Oracle Solaris Cluster software manages shared devices through a construct known as the DID pseudo driver.
This driver is used to automatically assign unique IDs to every device in the cluster, including multihost disks, tape drives, and CD-ROMs
The DID driver probes all nodes of the cluster and builds a list of unique devices, assigns each device a unique major and a minor number that are consistent on all nodes of the cluster.
Access to shared devices is performed by using the normalized DID logical name, instead of the traditional Oracle Solaris logical name, such as c0t0d0 for a disk.
Example:-
Host1 might identify a multihost disk as c1t2d0, and Host2 might identify the same disk completely differently, as c3t2d0.
The DID framework assigns a common (normalized) logical name, such as d10
 that the nodes use instead, giving each node a consistent mapping to the multihost disk.
Zone cluster membership:-
Oracle Solaris Cluster software also tracks zone cluster membership by detecting when a zone cluster node boots up or goes down.
Cluster Membership Monitor:-
To ensure that data is kept safe from corruption, all nodes must reach a consistent agreement on the cluster membership
The CMM receives information about connectivity to other nodes from the cluster transport layer.
The CMM uses the cluster inter connect to exchange state information during a reconfiguration.
After detecting a change in cluster membership, the CMM performs a synchronized configuration of the cluster.
 In a synchronized configuration, cluster resources might be redistributed , based on the new membership of the cluster.

split brain:-
it can occur when the cluster interconnect between cluster nodes is lost and the cluster becomes partitioned into sub clusters, and each sub cluster believes that it is the only partition.
 A sub cluster that is not aware of the other sub clusters could cause a conflict in shared resources, such as duplicate network addresses and data corruption.
The quorum subsystem manages the situation to ensure that split brain does not occur, and that one partition survives.

Fail Fast mechanism:-
The fail fast mechanism detects a critical problem on a global-cluster voting node.
When the critical problem is located in a voting node, Oracle Solaris Cluster forcibly shuts down the node.
Oracle Solaris Cluster then removes the node from cluster membership.
If a node loses connectivity with other nodes, the node attempts to form a cluster with the nodes with which communication is possible.
If that set of nodes does not form a quorum, Oracle Solaris Cluster software halts the node and “fences” the node from the shared disks, that is, prevents the node from accessing the shared disks.
 Fencing is a mechanism that is used by the cluster to protect the data integrity of a shared disk during split-brain situations. By default, global fencing enabled.
Eg:
If one or more cluster-specific daemons die, Oracle Solaris Cluster software declares that a critical problem has occurred.
When this occurs, Oracle Solaris Cluster shuts down and removes the node where the problem occurred
Cluster Configuration Repository:-
The Oracle Solaris Cluster software uses a Cluster Configuration Repository (CCR) to store the current cluster configuration information.
 The CCR uses a two-phase commit algorithm for updates: An update must be successfully completed on all cluster members or the update is rolled back.
 The CCR uses the cluster interconnect to apply the distributed updates.
The CCR relies on the CMM to guarantee that a cluster is running only when quorum is established.
The CCR is responsible for verifying data consistency across the cluster, performing recovery as necessary, and facilitating updates to the data.

Device Groups:-
The Oracle Solaris Cluster software automatically creates a raw device group for each disk and tape device in the cluster.
However, these cluster device groups remain in an offline state until you access them as global devices.
Each cluster node that is physically attached to the multihost disks provides a path to the device group.

Device Group Ownership:-
Oracle Solaris Cluster software provides two properties that configure a multiported disk configuration
1. Preferenced [control the order in which nodes attempt to assume control if a failover occurs by using the preferenced property.]
2. numsecondaries [to set the number of secondary nodes for a device group that you want]
By default, your device group has one primary and one secondary.
The remaining available provider nodes become spares.
 If failover occurs, the secondary becomes primary and the node highest in priority on the node list becomes secondary.
The default number of secondaries is 1
Global Name Space:-
The Oracle Solaris Cluster software mechanism that enables global devices is the global namespace
The global namespace reflects both multihost disks and local disks.
Each cluster node that is physically connected to multihost disks provides a path to the storage for any node in the cluster.
Advantages:
·         Each host remains fairly independent, with little change in the device administration model.
·         Third-party generated device trees are still valid and continue to work.

The global namespace is automatically generated on installation and updated with every reconfiguration reboot

Local and Global Namespace Mappings
Component or Path
Local Host Namespace
Global Namespace
Oracle Solaris logical name
/dev/dsk/c0t0d0s0
/global/.devices/node@nodeID/dev/dsk/c0t0d0s0
DID name
/dev/did/dsk/d0s0
/global/.devices/node@nodeID/dev/did/dsk/d0s0
Solaris Volume Manager
/dev/md/diskset/dsk/d0
/global/.devices/node@nodeID/dev/md/shared/diskset#/dsk/d0


Cluster FileSystems:-
a cluster file system based on the Oracle Solaris Cluster Proxy File System (PxFS).
Features:-
1)      File access locations are transparent.
2)      file is accessed concurrently from multiple nodes.
3)      Cluster file systems are independent from the underlying file system and volume management software.
4)      A cluster file system is mounted on all cluster members. You cannot mount a cluster file system on a subset of cluster members.
Manual Mounting cluster file system:-
# mount -g /dev/global/dsk/d0s0 /global/oracle/data                      

Disk Path Monitoring[DPM]:-
DPM improves the overall reliability of failover and switchover by monitoring secondary disk path availability.
Location
Component
Daemon
/usr/cluster/lib/sc/scdpmd
Command-line interface
/usr/cluster/bin/cldevice
Daemon status file (created at runtime)
/var/run/cluster/scdpm.status

A multi-threaded DPM daemon runs on each node. The DPM daemon (scdpmd) is started by an SMF service, system/cluster/scdpm, when a node boots.

1) The DPM daemon gathers disk path and node name information from the previous status file or from the CCR database.
2) The DPM daemon initializes the communication interface to respond to requests from components that are external to the daemon.
3) The DPM daemon pings each disk path in the monitored list every 10 minutes by using scsi_inquiry commands.
4) The DPM daemon notifies the Oracle Solaris Cluster Event Framework and logs the new status of the path through the UNIX syslogd command.

The cldevice command enables you to perform the following tasks:
1)      Monitor a new disk path
2)      Unmonitor a disk path
3)      Reread the configuration data from the CCR database                                        
4)      Read the disks to monitor or unmonitor from a specified file
5)      Report the status of a disk path or all disk paths in the cluster
6)      Print all the disk paths that are accessible from a node

Quorum & Quorum Devices:-
Two types of problems arise from cluster partitions:
·         Split brain
·         Amnesia

Split brain :-
It occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into subclusters.
 Each partition “believes” that it is the only partition because the nodes in one partition cannot communicate with the node or nodes in the other partition.
Amnesia :-
it occurs when the cluster restarts after a shutdown with cluster configuration data that is older than the data was at the time of the shutdown.
 This problem can occur when you start the cluster on a node that was not in the last functioning cluster partition.

Oracle Solaris Cluster software avoids split brain and amnesia by:
·         Assigning each node one vote
·         Mandating a majority of votes for an operational cluster

A node contributes votes depending on the node's state:
·         A node has a vote count of one when it boots and becomes a cluster member.
·         A node has a vote count of zero when the node is being installed.
·         A node has a vote count of zero when a system administrator places the node into maintenance state.
Oracle Solaris Cluster software assigns the quorum device a vote count of N-1 where N is the number of connected nodes to the quorum device.
Quorum in Two–Node Configurations
Two quorum votes are required for a two-node cluster to form. These two votes can derive from the two cluster nodes, or from just one node and a quorum device.
 Quorum in Greater Than Two–Node Configurations
Quorum devices are not required when a cluster includes more than two nodes, as the cluster survives failures of a single node without a quorum device. However, under these conditions, you cannot start the cluster without a majority of nodes in the cluster.

No comments:

Post a Comment