1] Administration
1.1 Cluster / Always ON
Windows Server Failover Clustering
select cluster_name
, member_name from sys.dm_hadr_cluster_members , sys.dm_hadr_cluster
cluster
|
member
|
W-CLUSTER-6
|
SRV2
|
W-CLUSTER-6
|
SRV11
|
W-CLUSTER-6
|
WSFC quorum
|
select *
from sys.dm_hadr_availability_replica_cluster_nodes
groupe
|
replica_server
|
node
|
AvailGrpINST
|
SRV11\INST
|
SRV11
|
AvailGrpINST
|
SRV2\INST
|
SRV2
|
1.2 See Configuration
-- state :
SYNCHRONYZED / SYNCHRONYZING / NOT SYNCHRONYZING
SELECT ag.name AS
ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,
is_ag_replica_local = CASE
WHEN ar_state.is_local = 1 THEN
N'LOCAL'
ELSE 'REMOTE'
END ,
ag_replica_role = CASE
WHEN ar_state.role_desc IS NULL
THEN N'DISCONNECTED'
ELSE ar_state.role_desc
END ,
ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
FROM (( sys.availability_groups
AS ag JOIN
sys.availability_replicas
AS ar ON ag.group_id = ar.group_id )
JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_database_replica_states dr_state on
ag.group_id =
dr_state.group_id and dr_state.replica_id = ar_state.replica_id;
ag_name
|
ag_replica_server
|
DB_name
|
is_ag_replica_local
|
ag_replica_role
|
AvailGrpINST1
|
SRV11\INST1
|
XALOC1
|
LOCAL
|
PRIMARY
|
AvailGrpINST1
|
SRV11\INST1
|
XA2K8LOC1PROD
|
LOCAL
|
PRIMARY
|
AvailGrpINST1
|
SRV2\INST1
|
XALOC1
|
REMOTE
|
SECONDARY
|
AvailGrpINST1
|
SRV2\INST1
|
XA2K8LOC1PROD
|
REMOTE
|
SECONDARY
|
connected_state_desc
|
availability_mode_desc
|
synchronization_state_desc
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
1.3 Modify the Availability Mode
Asynchronous-commit mode is a disaster-recovery solution that works well
when the availability replicas are distributed over considerable distances. If
every secondary replica is running under asynchronous-commit mode, the primary
replica does not wait for any of the secondary replicas to harden the log.
Rather, immediately after writing the log record to the local log file, the primary
replica sends the transaction confirmation to the client.
Synchronous-commit mode emphasizes high availability over performance, at
the cost of increased transaction latency. Under synchronous-commit mode,
transactions wait to send the transaction confirmation to the client until the
secondary replica has hardened the log to disk. When data synchronization
begins on a secondary database, the secondary replica begins applying incoming
log records from the corresponding primary database. As soon as every log
record has been hardened, the secondary database enters the SYNCHRONIZED state.
To change the availability mode
of an availability group
ALTER AVAILABILITY GROUP
group_name MODIFY REPLICA ON 'server_name'
WITH ( {
AVAILABILITY_MODE = {
SYNCHRONOUS_COMMIT | ASYNCHRONOUS_COMMIT }
| FAILOVER_MODE = { AUTOMATIC
| MANUAL }
} )
FAILOVER_MODE = AUTOMATIC is supported only if you also specify
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT.
In case of
Network Latency, change the SYNC to the AYNC Mode :
Connect to
the PRIMARY
ALTER
AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'
WITH (AVAILABILITY_MODE =
ASYNCHRONOUS_COMMIT);
ALTER
AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'
WITH (FAILOVER_MODE = MANUAL);
Revert to SYNC
Mode :
ALTER
AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'
WITH (AVAILABILITY_MODE =
SYNCHRONOUS_COMMIT);
ALTER
AVAILABILITY GROUP AvailGrpINST1 MODIFY REPLICA ON 'SRV11\INST1'
WITH (FAILOVER_MODE = AUTOMATIC);
1.4 Manual Failover
-- The
following example manually fails over the MyAg availability group to the
connected secondary replica.
-- connect
to secondary you want to fail to
ALTER AVAILABILITY
GROUP AvailGrpINST1 FAILOVER;
1.5 Automatic Failover in SYNC mode
1/ LOSS OF SECONDARY REPLICA
Immediate detection by Primary => Synchronization
to that replica is deactivated / Zero Impact on Primary
2/ LOSS OF PRIMARY
Immediate detection by Secondary => becomes
Primary : Rollback and Listener Failover, takes less than a minute (
depends on the number of transactions to rollback ).
2] Monitoring
Through Nagios
Type
|
Mode
|
Unit
|
GOOD
|
WARNING
|
CRITICAL
|
Comment
|
Configuration
|
Failover
detection
|
1|0
|
1
|
0
|
0
|
select CASE ( SELECT
ag_replica_role = CASE WHEN ar_state.role_desc
IS NULL THEN N'DISCONNECTED' ELSE
ar_state.role_desc END
FROM ((( sys.availability_groups AS ag JOIN
sys.availability_replicas
AS ar ON ag.group_id = ar.group_id )
JOIN sys.dm_hadr_availability_replica_states
AS ar_state ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_availability_replica_cluster_nodes
N ON ar.replica_server_name = N.replica_server_name)
where N.node_name like 'PRIMARY-HOST' )
WHEN 'PRIMARY' THEN 1 ELSE 0 END
|
Performance
|
Wait Event ( time/req )
overhead of sending a transaction from the Primary
DB and hardening onto the Replica DB
|
ms
|
< 6
|
5 < X < 11
|
> 10
|
WITH RQ AS ( select waiting_tasks_count A , wait_time_ms/waiting_tasks_count B from sys.dm_os_wait_stats where waiting_tasks_count >0 and wait_type = 'HADR_SYNC_COMMIT' ) , SY AS ( SELECT connected_state_desc ST FROM sys.dm_hadr_availability_replica_states where role_desc = 'SECONDARY' ) select case SY.ST when 'DISCONNECTED' then '100' else case when RQ.A <
10000 then '1' else RQ.B end end from RQ , SY union select '1' where not exists ( select 1 from RQ )
|
Notes :
Performance of the SYNC Mode :
- We
measure the latency only if in CONNECTED status
- there
is a threshold of waiting_tasks_count > 10000 under which the indicator
is irrelevant ( too few measures ).
- On
the other side, having too many measures, when the instance is started
from a very long time, will also produce an irrelevant indicator.
Therefore, we will have to reset this DMV statistics on a daily
basis :
DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);
GO
Another
grant needed :
GRANT VIEW ANY DEFINITION to Nagios
3] Performance
Check for Latency in SYNC mode.
Measure the
latency caused by the SYNC Mode : in Milliseconds
-- latency
in SYNC mode : time_per_wait MilliSeconds
select wait_type, waiting_tasks_count, wait_time_ms, wait_time_ms/waiting_tasks_count as'time_per_wait'
from sys.dm_os_wait_stats where waiting_tasks_count >0
and wait_type = 'HADR_SYNC_COMMIT'
wait_type
|
waiting_tasks_count
|
wait_time_ms
|
time_per_wait
|
HADR_SYNC_COMMIT
|
3836407
|
13201552
|
3
|
See Wait
Events
-- Wait
event in SYNC mode : HADR_SYNC_COMMIT waiting on commit by the Secondary
replica
select session_id, status,command,blocking_session_id, wait_type, wait_time, last_wait_type from sys.dm_exec_requests where session_id>=50
session_id
|
status
|
command
|
blocking_session_id
|
wait_type
|
wait_time
|
last_wait_type
|
56
|
suspended
|
EXECUTE
|
0
|
SP_SERVER_DIAGNOSTICS_SLEEP
|
237
|
SP_SERVER_DIAGNOSTICS_SLEEP
|
80
|
running
|
SELECT
|
0
|
NULL
|
0
|
MISCELLANEOUS
|
HADR_SYNC_COMMIT :
waiting on COMMIT by the Secondary
SP_SERVER_DIAGNOSTICS_SLEEP
ALWAYS-ON technical Event
The wait event that captures the overhead of sending a transaction from
the Primary DB and hardening onto the Replica DB is HADR_SYNC_COMMIT.
The typical values that should be seen for HADR_SYNC_COMMIT
is 2-4 milliseconds per request (Time/Request)
4] Log File Shrink When Always-ON is
ON !
Using the Studio Database / Task / Shrink, it may be not
possible to reduce the T-LOG.
You may have to execute BACKUP LOG multiple times, while you
have the following output :
select log_reuse_wait_desc from sys.databases where name = 'Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d'
BACKUP_LOG
After shrinking the logfile,
use
"Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d"
go
-- Shrink the log
Declare
@LogFileLogicalName sysname
select
@LogFileLogicalName=Name
from sys.database_files where Type=1
print
@LogFileLogicalName
DBCC
Shrinkfile(@LogFileLogicalName,1024)
you get
select log_reuse_wait_desc from sys.databases where name = 'Search_Service_Application_CrawlStoreDB_c29f61691e374bf6abfd801294b0de1d'
NOTHING
5] How to remove / add an existing DB from
Always ON using T-SQL
5.1 Remove DB from Always-ON
configuration
-- primary
dbcc sqlperf('logspace') --
XDIQR 24 GB
SELECT ag.name AS
ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,
is_ag_replica_local = CASE
WHEN ar_state.is_local = 1 THEN
N'LOCAL'
ELSE 'REMOTE'
END ,
ag_replica_role = CASE
WHEN ar_state.role_desc IS NULL
THEN N'DISCONNECTED'
ELSE ar_state.role_desc
END ,
ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
FROM (( sys.availability_groups
AS ag JOIN
sys.availability_replicas
AS ar ON ag.group_id = ar.group_id )
JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_database_replica_states dr_state on
ag.group_id =
dr_state.group_id and dr_state.replica_id = ar_state.replica_id;
-- modify Always-On configuration
ALTER AVAILABILITY
GROUP [AvailGrpAPPQR] REMOVE DATABASE [XDIQR];
ag_name
|
ag_replica_server
|
DB_name
|
is_ag_replica_local
|
ag_replica_role
|
AvailGrpAPPQR
|
W-ISSQR-11\APPQR
|
XDIQR
|
LOCAL
|
PRIMARY
|
AvailGrpAPPQR
|
W-ISSQR-12\APPQR
|
XDIQR
|
REMOTE
|
SECONDARY
|
connected_state_desc
|
availability_mode_desc
|
synchronization_state_desc
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
CONNECTED
|
SYNCHRONOUS_COMMIT
|
SYNCHRONIZED
|
After the ALTER Command, it goes to « No rows returned ».
5.2 Add DB back into Always-ON
configuration
-- go to
secondary
-- :Connect
W-ISSQR-12\INSTQR
-- XDIQR DB
Remove
select name from
sys.databases
drop database XDIQR
-- primary
-- add DB to
Always-On configuration
SELECT SERVERPROPERTY ('IsHadrEnabled'); -- 1 Si IsHadrEnabled = 1, l'option Groupes de disponibilité
AlwaysOn est activée.
USE [master]
BACKUP DATABASE [XDIQR] TO DISK = N'\\w-issqr-12\Backup\APPQR\XDIQR.bak'
;
-- go to
secondary
RESTORE DATABASE [XDIQR] FROM DISK = N'\\w-issqr-12\Backup\APPQR\XDIQR.bak'
WITH NORECOVERY
-- primary
ALTER AVAILABILITY GROUP [AvailGrpAPPQR] ADD
DATABASE [XDIQR];
BACKUP LOG XDIQR TO DISK =
'\\w-issqr-12\Backup\APPQR\XDIQR_20151125_3.trn'
GO
-- go to
secondary
-- to avoid
Msg 1478 :
-- The
mirror database has insufficient transaction log data to preserve the log
backup chain …
RESTORE LOG
XDIQR FROM
DISK = '\\w-issqr-12\Backup\APPQR\XDIQR_20151125_3.trn' WITH
FILE=1,
NORECOVERY
GO
ALTER DATABASE
[XDIQR] SET HADR AVAILABILITY
GROUP = [AvailGrpAPPQR];
-- Wait for the replica to start communicating
begin try
declare @conn bit
declare @count int
declare @replica_id uniqueidentifier
declare @group_id uniqueidentifier
set @conn = 0
set @count = 30 -- wait for 5 minutes
if (serverproperty('IsHadrEnabled') =
1)
and (isnull((select
member_state from master.sys.dm_hadr_cluster_members where upper(member_name
COLLATE Latin1_General_CI_AS) = upper(cast(serverproperty('ComputerNamePhysicalNetBIOS') as
nvarchar(256)) COLLATE Latin1_General_CI_AS)), 0) <>
0)
and (isnull((select
state from master.sys.database_mirroring_endpoints), 1) =
0)
begin
select
@group_id = ags.group_id from master.sys.availability_groups as ags where
name = N'AvailGrpAPPQR'
select @replica_id = replicas.replica_id from master.sys.availability_replicas as replicas where upper(replicas.replica_server_name COLLATE Latin1_General_CI_AS) =
upper(@@SERVERNAME COLLATE Latin1_General_CI_AS) and
group_id = @group_id
while @conn <>
1 and @count > 0
begin
set @conn
= isnull((select
connected_state from master.sys.dm_hadr_availability_replica_states as states where states.replica_id = @replica_id), 1)
if @conn = 1
begin
-- exit loop when the replica is
connected, or if the query cannot find the replica status
break
end
waitfor delay '00:00:10'
set @count = @count
- 1
end
end
end try
begin catch
-- If the wait loop fails, do not stop execution of the
alter database statement
end catch
5.3 Check
-- check
SELECT ag.name AS
ag_name, ar.replica_server_name AS ag_replica_server, DB_NAME(dr_state.database_id) as DB_name,
is_ag_replica_local = CASE
WHEN ar_state.is_local = 1 THEN
N'LOCAL'
ELSE 'REMOTE'
END ,
ag_replica_role = CASE
WHEN ar_state.role_desc IS NULL
THEN N'DISCONNECTED'
ELSE ar_state.role_desc
END ,
ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
FROM (( sys.availability_groups
AS ag JOIN
sys.availability_replicas
AS ar ON ag.group_id = ar.group_id )
JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_database_replica_states dr_state on
ag.group_id =
dr_state.group_id and dr_state.replica_id = ar_state.replica_id;
6] Get Database back in Always ON Synchronization
using T-SQL
6.1 Availability Level
Failover without Data Loss
ALTER AVAILABILITY GROUP [AG-LSN-IOSQR] FAILOVER;
Failover with Data Loss
ALTER AVAILABILITY GROUP [AG-LSN-W-SRV]
FORCE_FAILOVER_ALLOW_DATA_LOSS;
6.2 Database Level
Resume data Movement
ALTER DATABASE StateService SET HADR RESUME ;