OraCsdP: ALWAYS ON

1] Administration

1.1 Cluster / Always ON

Windows Server Failover Clustering

select cluster_name , member_name from sys.dm_hadr_cluster_members , sys.dm_hadr_cluster

cluster	member
W-CLUSTER-6	SRV2
W-CLUSTER-6	SRV11
W-CLUSTER-6	WSFC quorum

select * from sys.dm_hadr_availability_replica_cluster_nodes

groupe	replica_server	node
AvailGrpINST	SRV11\INST	SRV11
AvailGrpINST	SRV2\INST	SRV2

1.2 See Configuration

-- state : SYNCHRONYZED / SYNCHRONYZING / NOT SYNCHRONYZING

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,

is_ag_replica_local = CASE

WHEN ar_state.is_local = 1 THEN N'LOCAL'

ELSE 'REMOTE'

END ,

ag_replica_role = CASE

WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'

ELSE ar_state.role_desc

END ,

ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc

FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )

JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)

JOIN sys.dm_hadr_database_replica_states dr_state on

ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;

ag_name	ag_replica_server	DB_name	is_ag_replica_local	ag_replica_role
AvailGrpINST1	SRV11\INST1	XALOC1	LOCAL	PRIMARY
AvailGrpINST1	SRV11\INST1	XA2K8LOC1PROD	LOCAL	PRIMARY
AvailGrpINST1	SRV2\INST1	XALOC1	REMOTE	SECONDARY
AvailGrpINST1	SRV2\INST1	XA2K8LOC1PROD	REMOTE	SECONDARY

connected_state_desc	availability_mode_desc	synchronization_state_desc
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED

1.3 Modify the Availability Mode

Asynchronous-commit mode is a disaster-recovery solution that works well when the availability replicas are distributed over considerable distances. If every secondary replica is running under asynchronous-commit mode, the primary replica does not wait for any of the secondary replicas to harden the log. Rather, immediately after writing the log record to the local log file, the primary replica sends the transaction confirmation to the client.

Synchronous-commit mode emphasizes high availability over performance, at the cost of increased transaction latency. Under synchronous-commit mode, transactions wait to send the transaction confirmation to the client until the secondary replica has hardened the log to disk. When data synchronization begins on a secondary database, the secondary replica begins applying incoming log records from the corresponding primary database. As soon as every log record has been hardened, the secondary database enters the SYNCHRONIZED state.

To change the availability mode of an availability group

ALTER AVAILABILITY GROUP group_name MODIFY REPLICA ON 'server_name'

WITH ( {

AVAILABILITY_MODE = { SYNCHRONOUS_COMMIT | ASYNCHRONOUS_COMMIT }

| FAILOVER_MODE = { AUTOMATIC | MANUAL }

} )

FAILOVER_MODE = AUTOMATIC is supported only if you also specify AVAILABILITY_MODE = SYNCHRONOUS_COMMIT.

In case of Network Latency, change the SYNC to the AYNC Mode :

Connect to the PRIMARY

ALTER AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'

WITH (AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT);

ALTER AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'

WITH (FAILOVER_MODE = MANUAL);

Revert to SYNC Mode :

ALTER AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'

WITH (AVAILABILITY_MODE = SYNCHRONOUS_COMMIT);

ALTER AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'

WITH (FAILOVER_MODE = AUTOMATIC);

1.4 Manual Failover

-- The following example manually fails over the MyAg availability group to the connected secondary replica.

-- connect to secondary you want to fail to

ALTER AVAILABILITY GROUP AvailGrpINST1 FAILOVER;

1.5 Automatic Failover in SYNC mode

1/ LOSS OF SECONDARY REPLICA

Immediate detection by Primary => Synchronization to that replica is deactivated / Zero Impact on Primary

2/ LOSS OF PRIMARY

Immediate detection by Secondary => becomes Primary : Rollback and Listener Failover, takes less than a minute ( depends on the number of transactions to rollback ).

2] Monitoring

Through Nagios

Type	Mode	Unit	GOOD	WARNING	CRITICAL	Comment
Configuration	Failover detection	1\|0	1	0	0	select CASE ( SELECT ag_replica_role = CASE WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED' ELSE ar_state.role_desc END FROM ((( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id ) JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id) JOIN sys.dm_hadr_availability_replica_cluster_nodes N ON ar.replica_server_name = N.replica_server_name) where N.node_name like 'PRIMARY-HOST' ) WHEN 'PRIMARY' THEN 1 ELSE 0 END
Performance	Wait Event ( time/req ) overhead of sending a transaction from the Primary DB and hardening onto the Replica DB	ms	< 6	5 < X < 11	> 10	WITH RQ AS ( select waiting_tasks_count A , wait_time_ms/waiting_tasks_count B from sys.dm_os_wait_stats where waiting_tasks_count >0 and wait_type = 'HADR_SYNC_COMMIT' ) , SY AS ( SELECT connected_state_desc ST FROM sys.dm_hadr_availability_replica_states where role_desc = 'SECONDARY' ) select case SY.ST when 'DISCONNECTED' then '100' else case when RQ.A < 10000 then '1' else RQ.B end end from RQ , SY union select '1' where not exists ( select 1 from RQ )

Notes :

Performance of the SYNC Mode :

We measure the latency only if in CONNECTED status

there is a threshold of waiting_tasks_count > 10000 under which the indicator is irrelevant ( too few measures ).
On the other side, having too many measures, when the instance is started from a very long time, will also produce an irrelevant indicator. Therefore, we will have to reset this DMV statistics on a daily basis :

DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);

Another grant needed :

GRANT VIEW ANY DEFINITION to Nagios

3] Performance

Check for Latency in SYNC mode.

Measure the latency caused by the SYNC Mode : in Milliseconds

-- latency in SYNC mode : time_per_wait MilliSeconds

select wait_type, waiting_tasks_count, wait_time_ms, wait_time_ms/waiting_tasks_count as'time_per_wait'

from sys.dm_os_wait_stats where waiting_tasks_count >0

and wait_type = 'HADR_SYNC_COMMIT'

wait_type	waiting_tasks_count	wait_time_ms	time_per_wait
HADR_SYNC_COMMIT	3836407	13201552	3

See Wait Events

-- Wait event in SYNC mode : HADR_SYNC_COMMIT waiting on commit by the Secondary replica

select session_id, status,command,blocking_session_id, wait_type, wait_time, last_wait_type from sys.dm_exec_requests where session_id>=50

session_id	status	command	blocking_session_id	wait_type	wait_time	last_wait_type
56	suspended	EXECUTE	0	SP_SERVER_DIAGNOSTICS_SLEEP	237	SP_SERVER_DIAGNOSTICS_SLEEP
80	running	SELECT	0	NULL	0	MISCELLANEOUS

HADR_SYNC_COMMIT : waiting on COMMIT by the Secondary

SP_SERVER_DIAGNOSTICS_SLEEP ALWAYS-ON technical Event

The wait event that captures the overhead of sending a transaction from the Primary DB and hardening onto the Replica DB is HADR_SYNC_COMMIT.

The typical values that should be seen for HADR_SYNC_COMMIT is 2-4 milliseconds per request (Time/Request)

4] Log File Shrink When Always-ON is ON !

Using the Studio Database / Task / Shrink, it may be not possible to reduce the T-LOG.

You may have to execute BACKUP LOG multiple times, while you have the following output :

select log_reuse_wait_desc from sys.databases where name = 'Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d'

BACKUP_LOG

After shrinking the logfile,

use "Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d"

-- Shrink the log

Declare @LogFileLogicalName sysname

select @LogFileLogicalName=Name from sys.database_files where Type=1

print @LogFileLogicalName

DBCC Shrinkfile(@LogFileLogicalName,1024)

you get

select log_reuse_wait_desc from sys.databases where name = 'Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d'

NOTHING

5] How to remove / add an existing DB from Always ON using T-SQL

5.1 Remove DB from Always-ON configuration

-- primary

dbcc sqlperf('logspace') -- XDIQR 24 GB

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,

is_ag_replica_local = CASE

WHEN ar_state.is_local = 1 THEN N'LOCAL'

ELSE 'REMOTE'

END ,

ag_replica_role = CASE

WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'

ELSE ar_state.role_desc

END ,

ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc

FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )

JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)

JOIN sys.dm_hadr_database_replica_states dr_state on

ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;

-- modify Always-On configuration

ALTER AVAILABILITY GROUP [AvailGrpAPPQR] REMOVE DATABASE [XDIQR];

ag_name	ag_replica_server	DB_name	is_ag_replica_local	ag_replica_role
AvailGrpAPPQR	W-ISSQR-11\APPQR	XDIQR	LOCAL	PRIMARY
AvailGrpAPPQR	W-ISSQR-12\APPQR	XDIQR	REMOTE	SECONDARY

connected_state_desc	availability_mode_desc	synchronization_state_desc
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED
CONNECTED	SYNCHRONOUS_COMMIT	SYNCHRONIZED

After the ALTER Command, it goes to « No rows returned ».

~~4.2 Shrink LOG~~

~~use~~ ~~XDIQR~~

~~-- Truncate the log by changing the database recovery model to SIMPLE.~~

~~-- = 2005 Mark the inactive part of the log for release : Backup Log YourDatabaseName With Truncate_Only~~

~~ALTER~~ ~~DATABASE XDIQR~~

~~SET~~ ~~RECOVERY SIMPLE;~~

~~-- Shrink the log~~

~~Declare~~ ~~@LogFileLogicalName sysname~~

~~select~~ ~~@LogFileLogicalName=Name from sys.database_files where Type=1~~

~~print~~ ~~@LogFileLogicalName~~

~~DBCC~~ ~~Shrinkfile(@LogFileLogicalName,1024)~~

~~-- Reset the database recovery model.~~

~~ALTER~~ ~~DATABASE XDIQR~~

~~SET~~ ~~RECOVERY FULL;~~

~~dbcc~~ ~~sqlperf('logspace') -- OK 1 GB vs 25 GB~~

5.2 Add DB back into Always-ON configuration

-- go to secondary

-- :Connect W-ISSQR-12\INSTQR

-- XDIQR DB Remove

select name from sys.databases

drop database XDIQR

-- primary

-- add DB to Always-On configuration

SELECT SERVERPROPERTY ('IsHadrEnabled'); -- 1 Si IsHadrEnabled = 1, l'option Groupes de disponibilité AlwaysOn est activée.

USE [master]

BACKUP DATABASE [XDIQR] TO DISK = N'\\w-issqr-12\Backup\APPQR\XDIQR.bak' ;

-- go to secondary

RESTORE DATABASE [XDIQR] FROM DISK = N'\\w-issqr-12\Backup\APPQR\XDIQR.bak' WITH NORECOVERY

-- primary

ALTER AVAILABILITY GROUP [AvailGrpAPPQR] ADD DATABASE [XDIQR];

BACKUP LOG XDIQR TO DISK = '\\w-issqr-12\Backup\APPQR\XDIQR_20151125_3.trn'

-- go to secondary

-- to avoid Msg 1478 :

-- The mirror database has insufficient transaction log data to preserve the log backup chain …

RESTORE LOG XDIQR FROM DISK = '\\w-issqr-12\Backup\APPQR\XDIQR_20151125_3.trn' WITH FILE=1, NORECOVERY

ALTER DATABASE [XDIQR] SET HADR AVAILABILITY GROUP = [AvailGrpAPPQR];

-- Wait for the replica to start communicating

begin try

declare @conn bit

declare @count int

declare @replica_id uniqueidentifier

declare @group_id uniqueidentifier

set @conn = 0

set @count = 30 -- wait for 5 minutes

if (serverproperty('IsHadrEnabled') = 1)

and (isnull((select member_state from master.sys.dm_hadr_cluster_members where upper(member_name COLLATE Latin1_General_CI_AS) = upper(cast(serverproperty('ComputerNamePhysicalNetBIOS') as nvarchar(256)) COLLATE Latin1_General_CI_AS)), 0) <> 0)

and (isnull((select state from master.sys.database_mirroring_endpoints), 1) = 0)

begin

select @group_id = ags.group_id from master.sys.availability_groups as ags where name = N'AvailGrpAPPQR'

select @replica_id = replicas.replica_id from master.sys.availability_replicas as replicas where upper(replicas.replica_server_name COLLATE Latin1_General_CI_AS) = upper(@@SERVERNAME COLLATE Latin1_General_CI_AS) and group_id = @group_id

while @conn <> 1 and @count > 0

begin

set @conn = isnull((select connected_state from master.sys.dm_hadr_availability_replica_states as states where states.replica_id = @replica_id), 1)

if @conn = 1

begin

-- exit loop when the replica is connected, or if the query cannot find the replica status

break

end

waitfor delay '00:00:10'

set @count = @count - 1

end

end try

begin catch

-- If the wait loop fails, do not stop execution of the alter database statement

end catch

5.3 Check

-- check

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,

is_ag_replica_local = CASE

WHEN ar_state.is_local = 1 THEN N'LOCAL'

ELSE 'REMOTE'

END ,

ag_replica_role = CASE

WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'

ELSE ar_state.role_desc

END ,

ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc

FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )

JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)

JOIN sys.dm_hadr_database_replica_states dr_state on

ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;

6] Get Database back in Always ON Synchronization using T-SQL

6.1 Availability Level

Failover without Data Loss

ALTER AVAILABILITY GROUP [AG-LSN-IOSQR] FAILOVER;

Failover with Data Loss

ALTER AVAILABILITY GROUP [AG-LSN-W-SRV] FORCE_FAILOVER_ALLOW_DATA_LOSS;

6.2 Database Level

Resume data Movement

ALTER DATABASE StateService SET HADR RESUME ;

OraCsdP

Categories

vendredi 31 juillet 2015

ALWAYS ON