Recommendations for Index Maintenance with AlwaysOn Availability Groups

March 3, 2015, 3:49 pm

≫ Next: AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

≪ Previous: How to add a TDE encrypted database to an Availability Group

SYMPTOMS

Consider the following scenario

The database is part of AlwaysOn Availability Groups
You run long and log-intensive transactions like Index maintenance/rebuilds

You observe one or more of the following symptoms:

Poor performing DML operations in availability databases on the primary replica if synchronous secondary replicas are present.
Huge log send queue in the primary.
Considerable log growth in those availability databases in which the index rebuild occurs
Redo backlog in secondary replicas.

CAUSE

Large maintenance operations like ALTER INDEX or CREATE INDEX can generate huge amounts of logged changes by the nature of the operation. These transactions can utilize parallelism to use multiple threads generating logged changes to accomplish the large transaction. This is in addition to the log generated by the regular day to day operations of your application.

For synchronous commit environments, these large transactions can create contention with other production changes, reducing the overall performance of the primary replica. Synchronous-commit mode emphasizes high availability over performance, at the cost of increased transaction latency. Under synchronous commit mode, transactions wait to send the transaction confirmation to the client until the secondary replica has hardened the log to disk. All this logged activity is being captured by a per-database single threaded log capture thread. In addition encryption and compression routines are also single threaded. When these single-threaded routines are processing very large amounts of logged activity, your application transactions may suffer performance degradation.

Log-intensive transactions like rebuilding indexes can cause the log file to grow significantly as the log cannot be truncated until redo has completed the changes in all secondary replicas.

MITIGATION

Transactions like ALTER INDEX or CREATE INDEX are log intensive by nature. We cannot eliminate the log generation but we can do intelligent maintenance to reduce the impact of the index rebuild on production activities. Here are some steps to minimize the impact:

Mitigation Steps From Index Maintenance Perspective

The following strategies may reduce the contention index rebuilding has on your production environment:

Run index maintenance during off peak period if there is any.
Frequency of the maintenance should be optimized with the goal of minimizing impact of log generation. You can do maintenance in phases, like each phase doing a subset of indexes at a particular time, instead of considering all indexes at a single go.
Rebuild indexes based on true need / impact to your production environment. Article Script to appropriate rebuild/reorganize database indexes - SQL Server 2005 shows how to achieve following steps to optimize the index maintenance process -
- Ignore heaps and small tables
- Check the fragmentation level of each index
- If the fragmentation level is low (say less than 10%) - do nothing
- If the fragmentation level is medium (say in between 10 to 30%) - reorganize the index
- If the fragmentation level is high (say more than 30%) - rebuild the index
Use MAXDOP setting in the ALTER INDEX command to reduce the concurrent index alteration activity. Operations such as creating, rebuilding, or dropping indexes can be resource intensive and can cause insufficient resources for other applications and database operations for the duration of the index operation. When this problem occurs, you can manually configure the maximum number of processors that are used to run the index statement by limiting the number of processors to use for the index operation. Lower MADOP setting can also reduce fragmentation with online Index rebuild operation. Below is an example of ALTER INDEX REBUILD with MAXDOP =1:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX [PK_Employee_BusinessEntityID] 
	ON [HumanResources].[Employee] 
	REBUILD WITH (MAXDOP=1, ONLINE= ON);
	GO

Consider using table partitioning. This way you can rebuild portions of the index piece by piece by using ALTER INDEX…REBUILD PARTITION. SQL Server 2014 supports the ONLINE rebuild of partitioned indexes. Below is a sample script to rebuild only on partition 1:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX [IX_TransactionHistory_ProductID] 
	ON [Production].[TransactionHistory] 
	REBUILD PARTITION = 1 
	WITH (ONLINE = ON);
	GO

For SQL server 2014 additional ALTER INDEX parameters like WAIT_AT_LOW_PRIORITY, MAX_DURATION, and ABORT_AFTER_WAIT can also be used. Following example rebuilds index with the ONLINE option including the low priority lock options:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX ALL ON Production.Product
	REBUILD WITH
	(ONLINE = ON ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 4 	MINUTES, ABORT_AFTER_WAIT = BLOCKERS )));
	GO

Mitigation Steps From Availability Group Perspective

For synchronous commit environments, before issuing long and log-intensive transactions like ALTER INDEX or CREATE INDEX, you may additionally consider to make all synchronous replicas asynchronous to help reduce the transactional latency. Once the index rebuild transactions are completed, the commit mode should be switched back to synchronous.

For both SYNCHRONOUS AND Asynchronous environments, in general any step that can help with redo performance will positively impact long and log-intensive transactions on AG environments. Here are some key points to keep in mind:

A busy secondary such as, secondary with resource bottleneck or a large reporting workload on the secondary replica, can slow down the performance of the secondary replica because of resource contention, and the redo thread can fall behind. The redo thread on the secondary replica can also be blocked from making data definition language (DDL) changes by a long-running read-only query. The diagnosis and mitigations steps for these issues are discussed in the article Troubleshoot: Availability Group Exceeded RTO.
You need to periodically check AG databases to make sure they do not have too many Virtual Log Files (VLFs) which can severely impact redo process of the secondary. Diagnosis and corrective steps are discussed in the article Too Many Virtual Log Files (VLFs) Can Cause Slow Database Recovery.
We recommend that you always switch to the High Performance power plan for all replica machines for all operating systems. In some circumstances this can have better performance for the single threaded redo process.

REFERENCES

↧

AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

March 6, 2015, 1:04 pm

≫ Next: AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if 'Primary DNS suffix' is not set

≪ Previous: Recommendations for Index Maintenance with AlwaysOn Availability Groups

Availability Groups Simulate Database Mirroring Connection Behavior

Given the following scenario, SQL Server will return the failover partner server name to a connection request:

The availability group has a single secondary replica.
The availability group replicas have ALLOW_CONNECTIONS set to NO or READ_ONLY.
The client application makes a successful initial connection to the primary replica.
The application is not specifying the availability group listener when connecting.

SQL Server will return the failover partner server name (the SQL Server name hosting the secondary replica). The data access provider will cache the failover partner server name.

Following a failover, if an application is designed to reconnect on connection failure, it will first fail to connect to the replica now hosting a secondary replica, and then attempt to connect to the cached failover partner, the primary replica.

This behavior may be unexpected. For example, an identical SQL Agent job is created on the primary and secondary replicas of an availability group and the job is not qualifying that the replica is primary sys.fn_hadr_is_primary_replica before executing its commands. The expectation is that the job succeeds when accessing the availability database on the primary replica and fails when attempting to access the availability database on the secondary because the replica is configured for ALLOW_CONNECTIONS=NO.

Instead, following a failover, the job that is now executing at the secondary replica will not behave as expected. The next time the job executes, it will fail to connect locally and then attempt to connect using the cached failover partner name and will successfully connect to the now primary replica. This can cause unexpected behavior and results because the job at the secondary is connecting and executing successfully against the primary replica.

This behavior is by design. AlwaysOn availability groups are designed to be backward compatible with applications that expect legacy database mirroring connection behavior.

The reconnect behavior will only occur given a certain configuration, when 1) there is a single secondary replica and 2) the availability replica’s ALLOW_CONNECTIONS is set to NO or READ_ONLY. The following table reports the expected behavior depending on these variables.

Workaround Database Mirroring Connection Behavior

If the legacy connection behavior is not desired, consider using one of the following workarounds to ensure your SQL Agent job only executes successfully when the local replica is in the primary role.

Set the availability group Backup Preferences to Primary and then in the job at each replica, use the sys.fn_hadr_is_primary_replica to ensure execution only locally at the primary replica.
Qualify execution by querying sys.dm_hadr_availability_replica_states where is_local and role both return 1.
Add a third replica to the availability group.

Database Mirroring Connection Behavior

For more information on database mirroring connection behavior see:

Making the Initial Connection to a Database Mirroring Session

↧

AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if 'Primary DNS suffix' is not set

March 16, 2015, 8:30 am

≫ Next: Failing back from DR site after primary site is back online

≪ Previous: AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

AlwaysOn availability groups will not function properly on Windows servers where the system's 'Primary DNS Suffix of this Computer' is not set. Usually, the Primary DNS Suffix is populated when a Windows server joins a domain.

Create Listener on Windows server without ‘Primary DNS suffix’ fails

Attempting to create the availability group listener will fail when the node hosting the primary replica has ‘Primary DNS Suffix of this Computer’ cleared. SQL Server will fail to create the listener and report:

When issuing ALTER AVAILABILITY GROUP…ADD LISTENER, the message reads:

Msg 19471, Level 16, State 0, Line 1
The WSFC cluster could not bring the Network Name resource with DNS name 'aglisten' online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.
Msg 19476, Level 16, State 4, Line 1
The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

Reviewing the cluster log on the server where the listener failed to create, errors report that the fully qualified name could not be acquired:

00000280.00000060::2015/03/16-15:47:41.463 INFO [RCM] rcm::RcmGum::CreateResource(ag_aglisten,d2881a8e-7c36-4c8f-a17c-d4cef149e355,ag)
...
00000d70.00000a88::2015/03/16-15:47:46.081 INFO [RES] Network Name <ag_aglisten>: AccountAD: OU name for VCO is CN=Computers,DC=AGDC,DC=COM
...
00000d70.00000a88::2015/03/16-15:47:46.266 ERR   [RES] Network Name: [NNLIB] Getting FQDN failed with error 87
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name: [NN] IdentityLocal End Impersonating
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name <ag_aglisten>: AccountAD: OnInitializeEnd: 87
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name <ag_aglisten>: AccountAD: Slow Operation, FinishWithReply: 87
...
00000d70.00000a88::2015/03/16-15:47:46.282 INFO [RES] Network Name: Agent: OnInitializeReply, Failure on (d2881a8e-7c36-4c8f-a17c-d4cef149e355,Configuration): 87
00000d70.00000a88::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: SyncReplyHandler Configuration, result: 87
00000d70.00000464::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: PerformOnline - Initialization of Configuration module finished with result: 87
00000d70.00000464::2015/03/16-15:47:46.282 ERR   [RES] Network Name <ag_aglisten>: Online thread Failed: ERROR_SUCCESS(0)' because of 'Initializing netname configuration for ag_aglisten failed with error 87.'
00000d70.00000464::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: All resources offline. Cleaning up.
00000d70.00000464::2015/03/16-15:47:46.282 ERR   [RHS] Online for resource ag_aglisten failed.

Availability Group failover to Server without ‘Primary DNS suffix’ fails

Attempting to failover an availability group which also has a listener defined, to a Windows server where ‘Primary DNS suffix of this computer’ is not set, fails. When attempting to issue transact-sql ALTER AVAILABILITY GROUP…FAILOVER or ALTER AVAILABILITY GROUP…FAILOVER_ALLOW_DATA_LOSS the command fails with the following messages:

Msg 41066, Level 16, State 0, Line 1
Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID 'a8aa7c13-4b9c-4423-bc68-6e90a81d0c21') online (Error code 5942). The WSFC service may not be running or may not be accessible in its current state, or the WSFC resource may not be in a state that could accept the request. For information about this error code, see "System Error Codes" in the Windows Development documentation.
Msg 41160, Level 16, State 0, Line 1
Failed to designate the local availability replica of availability group 'ag' as the primary replica. The operation encountered SQL Server error 41066 and has been terminated. Check the preceding error and the SQL Server error log for more details about the error and corrective actions.

Checking the cluster log on the Windows server which could not be failed over to, reports the following errors:

00000280.00000b08::2015/03/16-16:28:33.624 INFO [RCM] rcm::RcmApi::MoveGroup: (ag, 1, 0, MoveType::Manual )
00000280.00000db4::2015/03/16-16:28:33.670 INFO [NM] Received request from client address SQLNODE2.
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM] rcm::RcmApi::OnlineResource: (ag, 0)
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM-rbtr] giving default token to group ag
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM] rcm::RcmResource::Online: bringing ag's provider resource 'ag_aglisten' online.
...
00000d70.00000c94::2015/03/16-16:28:33.686 WARN [RES] Network Name: [NNLIB] AddServerName - Getting Computer Domain failed, error 203
...
00000d70.00000f54::2015/03/16-16:28:33.827 INFO [RES] Network Name <ag_aglisten>: PerformOnline - Initialization of Configuration module finished with result: 203
00000d70.00000f54::2015/03/16-16:28:33.827 ERR [RES] Network Name <ag_aglisten>: Online thread Failed: ERROR_SUCCESS(0)' because of 'Initializing netname configuration for ag_aglisten failed with error 203.'
00000d70.00000f54::2015/03/16-16:28:33.827 INFO [RES] Network Name <ag_aglisten>: All resources offline. Cleaning up.
00000d70.00000f54::2015/03/16-16:28:33.827 ERR [RHS] Online for resource ag_aglisten failed.

Check the ‘Primary DNS suffix of this computer’ setting’

You can find this setting under Control Panel’s System and Security, by clicking System and then the Change settings link. Then click the Change button in the System Properties, the More button in the Computer Name/Domain Changes dialog to view the ‘Primary DNS suffix of this computer’ setting.

NOTE Setting the DNS suffix will require a reboot of the server.

Enter the correct suffix for the server:

This problem may exist on other nodes in the cluster. Be sure to check all the nodes hosting replicas in the availability group, otherwise, you will be unable to failover your availability group and listener to that node. Ensure that the DNS suffix has been properly set on all the Windows cluster nodes that might be a failover target for your availability group.

For more information on diagnosing failed availability group listener creation

Create Listener Fails with Message 'The WSFC cluster could not bring the Network Name resource online'

↧

Failing back from DR site after primary site is back online

April 27, 2015, 1:19 pm

≫ Next: Replication Agents fail to connect to listener in a multisite cluster

≪ Previous: AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if 'Primary DNS suffix' is not set

Assume steps similar to those from Manual Failover of Availability Group were used to move to the DR site.

When the Availability Group is brought online on the DR site with alter availability group…force_failover_allow_data_loss, the replication from the new primary at the DR site to the secondary(s) at the primary site will be suspended, and can only be started manually. Once the primary site is back up and the connection between the two sites is stable, then bring the servers hosting the SQL Server service up and start the SQL Server Services. Check that the cluster service is running on all nodes at the primary site and there are no error messages in the system event log for the cluster service.

After verifying the cluster is stable, change the voting back to the original setup, if it was changed while bringing the the cluster up on the DR site.

Add voting back to the nodes at primary site.

When you failed the primary replica to the DR site, you may have also adjusted the node weights of those nodes hosting availability group replicas. It is time to set the node weight back, removing the node weight from the DR site and adding it back to the nodes a the primary data center.

(Get-ClusterNode –Name "NodeName").NodeWeight=1

Remove voting from node at DR site.

(Get-ClusterNode –Name "NodeName").NodeWeight=0

Verify the current voting in the cluster.

Get-ClusterNode | fl Name,NodeWeight

Synchronize Original Secondary at primary site

Change the synchronization mode to asynchronous and start synchronization with the SQL Server instance that was the secondary at the primary site for each database in the availability group.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
Alter database <dbname> SET HADR RESUME

After resuming the database synchronization, the status can be monitored using the availability group dashboard, adding the columns log send queue size (KB) and log send rate(KB/Sec) to the availability group column. If the secondary at the primary site was ahead of the secondary at the DR site, then the database will first have to go through the reverting process.

Once the databases for the availability group are synchronized, then you can change the synchronization to synchronous, if it is not already set, and failover from the DR site to the primary site.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
Alter Availability Group <agname> failover

After the failover to the secondary on the primary site, the synchronization can be changed to asynchronous.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT

At this point, the original primary is still at the point of time of the failover to the DR site and the secondary at the primary site is the new primary. If there was data loss that occurred during the failover to the DR site. A utility like tablediff or third party utility can be used to identify data from the original primary that is not in the current primary.

Steps to put original primary in read only mode, if data recovery is needed

For SQL Server 2014:

The database should be in a read only state while it is in a suspended state.

After recovering any data from the original primary, then it can begin synchronization.

Alter database <dbname> SET HADR RESUME

For SQL Server 2012:

The database will need to be removed from the availability group on the original primary and restored with recovery.

Alter database <dbname> SET HADR AVAILABILITY GROUP OFF
restore database <dbname> with recovery

Synchronize Original Primary

Use the same steps for the original secondary at the primary site.

↧

Replication Agents fail to connect to listener in a multisite cluster

April 27, 2015, 1:19 pm

≫ Next: Configure an ILB Listener for Alwayson Availability Group in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

≪ Previous: Failing back from DR site after primary site is back online

A listener (network name) for a multisite cluster will be dependent on more than 1 ip address. When connecting to a listener for a multisite cluster, it is recommended to add the parameter multisubnetfailover to the connection string for the application.

Legacy applications may not be able to use the latest version of Microsoft SQL Server Native Client (SNAC 11 or Higher), or the connection strings are not exposed to be able to use the parameter. The replication agents (distrib.exe, logread.exe, snapshot.exe, and replmerg.exe) would all be considered legacy applications, because they do not implement the multisubnetfailover parameter. There are two recommended solutions for using multisubnet listener with replication.

1) Increase the login timeout

If the replication agents are the only legacy application connecting to the listener, then using option 1 of increasing the login timeout value is the recommended solution. This can be done, by creating a new profile for the agent and changing the login timeout value from the default of 15 to 60.

The agent will need to be changed to use the new profile, and then stopped and started for the change to take effect.

For more information on updating the timeout parameter of the replication agent, see the following link on working with replication agent profiles.

https://technet.microsoft.com/en-us/library/ms152515(v=sql.110).aspx

This option is the recommend since most agents connect once and run continuously, so the delay would be experience in the initial connection. If the agent is set to run in an interval, then increasing the login timeout may cause a longer latency between the publisher and subscriber.

2) Change the RegisterAllProvidersIP for the listener from 1 to 0

https://msdn.microsoft.com/en-us/library/hh213080.aspx (RegisterAllProvidersIP Setting)

↧

Configure an ILB Listener for Alwayson Availability Group in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

October 20, 2015, 12:54 pm

≫ Next: Configure an External Listener for Alwayson Availability Groups in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

≪ Previous: Replication Agents fail to connect to listener in a multisite cluster

This topic shows you how to configure a listener for an AlwaysOn Availability Group by using an Internal Load Balancer (ILB) with Azure Virtual machine provisioned using the new Azure Resource Manager.

Summary Steps

Create ARM Load balancer using Power Shell Script
Configure Load Balancer's Backend pool
Configure Load Balancer's Probe
Configure Load Balancer's Load Balancing Rules
Create Availability Group Listener from FCI
Configure AlwaysOn Listener IP Resource
Configure AlwaysOn AG role dependency
Test the Availability Group Listener

Steps to create Azure Load balancer for Alwayson with ARM

Create an Internal Load Balance (ILB) with minimum parameter using the below script. The rest of the configuration would be done through Azure Ibiza Portal

Open an Azure Power shell ISE windows with Administrative Permission
Copy the below Script
#switching to Resource Manager mode
Switch-AzureMode-NameAzureResourceManager
#select the required subscription.
Select-AzureSubscription-SubscriptionName"Microsoft Azure "# Need to change the subscription Name
#Replace the values for the below listed variables
$ResourceGroupName='SQLARMTEST'# Resource Group Name in which the SQL nodes are deployed
$FrontEndConfigurationName='fe_TENNIS_SOCCER'#You can provide any name to this parameter.
$BackendConfiguratioName='be_TENNIS_SOCCER'#You can provide any name to this parameter.
$LoadBalancerName='ILB_TENNIS_SOCCER'#Provide a Name for the Internal Local balance object
$Location='westeurope'# Input the data center location of the SQL Deployements
$subname='sqlsubnet' # Provide the Subnet name in which the SQL Nodes are placed
$ILBIP='10.10.0.14'# Provide the IP address for the Listener or Load Balancer
$subnet=Get-AzureVirtualNetwork-ResourceGroupName$ResourceGroupName|Get-AzureVirtualNetworkSubnetConfig–name$subname
$FEConfig=New-AzureLoadBalancerFrontendIpConfig-Name$FrontEndConfigurationName-PrivateIpAddress$ILBIP-SubnetId$subnet.Id
$BackendConfig=New-AzureLoadBalancerBackendAddressPoolConfig-Name$BackendConfiguratioName
New-AzureLoadBalancer-Name$LoadBalancerName-ResourceGroupName$ResourceGroupName-Location$Location-FrontendIpConfiguration$FEConfig-BackendAddressPool$BackendConfig

Configure other Load Balancer Properties

Browse to new Ibiza portal (http://Portal.azure.com).
Browse All objects
Select Resource Groups
Selected the Resource Group used to deploy the Environment from list all the resource
The list would have the Load Balancer (ILB) created using the script
Click All Setting for the ILB

Configure Load Balancer's Backend Pool

Click Backend Pool setting tab
Will list the backend Pool created using the script i.e. 'be_TENNIS_SOCCER'
selects the backend pool to update the below setting
Availability setting - choose the Availability Setting object for the SQL nodes
Select Virtual Machine - Add the two SQL nodes
Save the setting

Configure Load Balancer's Probe

Click Probe Tab
Click Add to add a new probe
Name the probe
Protocol - TCP
Port - 59999
Save

Configure Load Balancer's Load Balancing Rules

Click Load Balancing Rule Tab
Click add to add a new rule
Name the rule
Protocol - TCP
Port – 1433
Backend Port - 1433
Backend pool–mapped to the backend pool 'be_TENNIS_SOCCER' (created earlier)
Probe - mapped to the probe
Session persistence - none
Floating IP (Direct Server Return) - Enabled

Create the Availability Group Listener from FCI (Refer Create the availability group listener)

Configure Alwayson Listener IP Resource

Run the below script on both the SQL Nodes (Regular steps)

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = "IP Address 10.10.0.0" # the IP Address resource name
$ILBIP = "10.10.0.14" # the IP Address of the Internal Load Balancer (ILB)

Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ILBIP";"ProbePort"="59999";"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";"OverrideAddressMatch"=1;"EnableDhcp"=0}

Configure the Dependency for AG Role (Refer Bring the listener online)

Test the availability group listener (within the same VNet)

↧

Configure an External Listener for Alwayson Availability Groups in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

October 20, 2015, 2:30 pm

≫ Next: Reducing Loss of Connectivity when Recreating an Availability Group

≪ Previous: Configure an ILB Listener for Alwayson Availability Group in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

This post was republished to AlwaysOn Professional at 3:54:28 PM 10/20/2015

Configure an External Listener for Alwayson Availability Groups in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

This topic shows you how to configure a listener for an AlwaysOn Availability Group by using an External Load Balancer (ELB) with Azure Virtual machine provisioned using the new Azure Resource Manager.

Summary Steps

Create ARM Load balancer using Power Shell Script
Configure Load Balancer's Backend pool
Configure Load Balancer's Probe
Configure Load Balancer's Load Balancing Rules
Configure Network Security Group to Allow SQL for connection
Create Availability Group Listener from FCI
Configure AlwaysOn Listener IP Resource
Configure AlwaysOn AG role dependency
Configure Firewall ports on AG Nodes
Test Availability Group Listener Connection externally

Steps to create Azure Load balancer for Alwayson with ARM

Create an External Load Balance (ELB) with minimum parameter using the below script. The rest of the configuration would be done through Azure Ibiza Portal

Open an Azure Power shell ISE windows with Administrative Permission
Copy the below Script

Switch-AzureMode-NameAzureResourceManager
Select-AzureSubscription-SubscriptionName"Microsoft Azure Internal Consumption"
##get-AzureResourceGroup
$ResourceGroupName='SQLARMTEST'
$StaticPIPName='sqltcelbip'
$FrontEndConfigurationName='fe_TENNIS_SOCCER'
$BackendConfiguratioName='be_TENNIS_SOCCER'
$LoadBalancerName='ILB_TENNIS_SOCCER'
$Location='westeurope'
#PUBLIC FACING IP
New-AzurePublicIpAddress-Name$StaticPIPName-ResourceGroupName$ResourceGroupName-Location$Location-AllocationMethodStatic-DomainNameLabel'sqlaglistener'
$publicIP=Get-AzurePublicIpAddress-Name$StaticPIPName-ResourceGroupName$ResourceGroupName;
$FEConfig=New-AzureLoadBalancerFrontendIpConfig-Name$FrontEndConfigurationName-PublicIpAddress$publicIP;
$BackendConfig=New-AzureLoadBalancerBackendAddressPoolConfig-Name$BackendConfiguratioName
New-AzureLoadBalancer-Name$LoadBalancerName-ResourceGroupName$ResourceGroupName-Location$Location-FrontendIpConfiguration$FEConfig-BackendAddressPool$BackendConfig

Configure other Load Balancer Properties

Browse to new Ibiza portal (http://Portal.azure.com).
Browse All objects
Select Resource Groups
Selected the Resource Group used to deploy the Environment from list all the resource
The list would have the Load Balancer (ILB) created using the script
Click All Setting for the ILB

Configure Load Balancer's Backend Pool

Click Backend Pool setting tab
Will list the backend Pool created using the script i.e. 'be_TENNIS_SOCCER'
selects the backend pool to update the below setting
Availability setting - choose the Availability Setting object for the SQL nodes
Select Virtual Machine - Add the two SQL nodes
Save the setting

Configure Load Balancer's Probe

Click Probe Tab
Click Add to add a new probe
Name the probe
Protocol - TCP
Port - 59999
Save

Configure Load Balancer's Load Balancing Rules

Click Load Balancing Rule Tab
Click add to add a new rule
Name the rule
Protocol - TCP
Port – 1433
Backend Port - 1433
Backend pool–mapped to the backend pool 'be_TENNIS_SOCCER' (created earlier)
Probe - mapped to the probe
Session persistence - none
Floating IP (Direct Server Return) - Enabled

Configure Load Balancer to Allow Connection to SQL externally

Select a SQL Virtual machine (Azure Portal)
Click All Settings
Click Network Interfaces
Select the Interface from the Grid
Click the Network Security Group in the Essential Dash Board Window
The New window would show the Property for Network Security Group
Click Inbound Security Rules
Click Add to add a new Rules
Name the Rule
Specify source and choose CIDR Block and provide the Value to allow specific IP Range
Protocol TCP
Destination 1433 i.e. SQL Server Listening Port
Action Allow
Click Ok
Click Add Again to add new rule
Name the Rule
Protocol TCP
Source Any
Destination 59999
Action - Allow
Repeat the above Steps for all SQL Nodes Part of AG.

Create the Availability Group Listener from FCI (Refer Create the availability group listener)

Configure Alwayson Listener IP Resource

Run the below script on both the SQL Nodes (Regular steps)

$ClusterNetworkName = "Cluster Network 1" # the cluster network name (Use Get-ClusterNetwork on Windows Server 2012 of higher to find the name)
$IPResourceName = "IP Address 10.10.0.0" # the IP Address resource name
$ILBIP = "104.40.180.157" # the IP Address of the External Load Balancer (ILB) i.e. value from the Script Get-AzurePublicIpAddress
Get-ClusterResource $IPResourceName | Set-ClusterParameter -Multiple @{"Address"="$ILBIP";"ProbePort"="59999";"SubnetMask"="255.255.255.255";"Network"="$ClusterNetworkName";"OverrideAddressMatch"=1;"EnableDhcp"=0}

Configure the Dependency for AG Role (Refer Bring the listener online)

Open Firewall ports on AG nodes

Create a Firewall Exception Rule for SQL Server Process or Listening Port is added on all the nodes
Create a Firewall Exception Rule for probe port 59999 is added on all the nodes.

Test

Availability Group Listener Connection externally

The application can use the Load Balancer's IP address for connecting to SQL

use the –DomainLabelName value as shown below.

sqlaglistener.westeurope.cloudapp.azure.com

i.e. DomainLabelName.DatacenterLocationName.cloudapp.azure.com

Follow the procedure in Test the availability group listener

↧

Reducing Loss of Connectivity when Recreating an Availability Group

January 23, 2014, 1:46 pm

≫ Next: How to enable TDE Encryption on a database in an Availability Group

≪ Previous: Configure an External Listener for Alwayson Availability Groups in Azure V2 or Virtual Machines created using Azure Resource Manager (ARM)

When you drop an availability group, the listener resource is also dropped and interrupts application connectivity to the availability databases.

To minimize application downtime, use one of the following methods to sustain application connectivity through the listener, and drop the availability group.

Method 1 Associate listener with new availability group (role) in Failover Cluster Manager

This method enables you to maintain the listener while dropping and recreating the availability group.

1. On the SQL Server that the existing availability group listener is directing connections to, create a new, empty availability group. To simplify, use the transact-sql command to create an availability group with no secondary replica or database:

use master
go
create availability group ag
for replica on 'sqlnode1' with
(endpoint_url = 'tcp://sqlnode1:5022', availability_mode=asynchronous_commit, failover_mode=manual)

2. Launch Failover Cluster Manager and click Roles in the left pane. In the pane listing the Roles select the original availability group.

3. In the bottom-middle pane, under the Resources tab right-click the availability group resource and choose Properties. Click the Dependencies tab and delete the dependency to the listener. Click OK.

4. In the bottom-middle pane, under the Resources tab, right click the listener and choose More Actions and then Assign to Another Role. In the dialog box, choose the new availability group and click OK.

5. In the Roles pane select the new availability group. In the bottom-middle pane, under the Resources tab, you should now see the new availability group and the listener resource. Right-click the new availability group resource and choose Properties. Click the Dependencies tab and select the listener resource from the drop-down box. Click OK.

6. In SQL Server Management Studio, use Object Explorer to connect to the SQL Server instance hosting the primary replica of the new availability group. Drill into AlwaysOn High Availability, then drill into the new availability group and then Availability Group Listeners. You should find the listener.

7. Right-click the listener and choose Properties. Enter the appropriate port number for the listener and click OK.

This will ensure that applications using the listener can still use it to connect to SQL Server hosting the production databases without interruption. The original availability group can now be completely removed and recreated or the databases and replicas can be added to the new availability group.

IMPORTANT: If recreating the original availability group, the following steps will re-assign the listener back to the recreated availability group role: assign the listener back to the recreated availability group role, set the dependency between the recreated availability group resource and the listener, and re-assign the port to the listener:

1. Launch Failover Cluster Manager and click Roles in the left pane. In the pane listing the Roles select the new availability group hosting the listener.

2. In the bottom-middle pane, under the Resources tab right-click the new availability group resource, which currently has a dependency on the listener and choose Properties. Click the Dependencies tab and delete the dependency to the listener. Click OK.

3. In the bottom middle pane, under the Resources tab, right click the listener and choose More Actions and then Assign to Another Role. In the dialog box, choose the recreated availability group and click OK.

4. In the Roles pane select the recreated availability group. In the bottom middle pane, under the Resources tab, you should now see the recreated availability group and the listener resource. Right-click the recreated availability group resource and choose Properties. Click the Dependencies tab and select the listener resource from the drop-down box. Click OK.

4. In SQL Server Management Studio, use Object Explorer to connect to the SQL Server instance hosting the primary replica of the recreated availability group. Drill into AlwaysOn High Availability, then drill into the recreated availability group and then Availability Group Listeners. You should find the listener.

5. Right-click the listener and choose Properties. Enter the appropriate port number for the listener and click OK.

Method 2 Associate listener with existing SQL Failover Clustered Instance (SQLFCI)

If you are hosting your availability group on a SQL Server Failover Clustered Instance (SQLFCI) you can associate the listener clustered resource with the SQLFCI clustered resource group, while dropping and recreating the availability group.

1. Launch Failover Cluster Manager and click Roles in the left pane. In the pane listing the Roles select the original availability group.

2. In the bottom middle pane, under the Resources tab, right-click the availability group resource and choose Properties. Click the Dependencies tab and delete the dependency to the listener. Click OK.

3. In the bottom middle pane, under the Resources tab, right click the listener and choose More Actions and then Assign to Another Role. In the dialog box, choose the SQL Server FCI instance and click OK.

4. In the Roles pane select the SQL Server Failover Clustered Instance (SQLFCI) group. In the bottom middle pane, under the Resources tab, you should now see the new listener resource.

IMPORTANT: Once the availability group is recreated, re-assign the listener back to the availability group role, set up the dependency between the new availability group resource and the listener and re-assign the port to the listener:

1. Launch Failover Cluster Manager and click Roles in the left pane. In the pane listing the Roles select the original SQL Failover Clustered Instance role.

2. In the bottom middle pane, under the Resources tab, right click the listener and choose More Actions and then Assign to Another Role. In the dialog box, choose the recreated availability group and click OK.

3. In the Roles pane select the recreated availability group. Under the Resources tab, you should now see the recreated availability group and the listener resource. Right-click the recreated availability group resource and choose Properties. Click the Dependencies tab and select the listener resource from the drop-down box. Click OK.

5. Right-click the listener and choose Properties. Enter the appropriate port number for the listener and click OK.

Method 3 Drop the availability group and recreate with the same listener name

This method will result in a small outage for currently connected applications, because the availability group and listener are dropped and then recreated.

1. Drop the problematic availability group. This will also drop the listener.

2. Immediately create a new, empty availability group on the same server hosting the production databases, and defined with the original listener name. Applications should now successfully re-connect by using the new listener. For example assume your availability group listener is 'aglisten.' The following transact-sql creates an availability group with no database or secondary, but creates a listener, 'aglisten' which applications can resume connecting through.

use master
go
create availability group ag
for replica on 'sqlnode1' with (endpoint_url = 'tcp://sqlnode1:5022', availability_mode=asynchronous_commit, failover_mode=manual)
listener 'aglisten' (with ip ((n'11.0.0.25', n'255.0.0.0')), port=1433)
go

3. Recover the damaged database and add it and the secondary replica back to the availability group.

↧

How to enable TDE Encryption on a database in an Availability Group

January 28, 2014, 11:30 am

≫ Next: Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

≪ Previous: Reducing Loss of Connectivity when Recreating an Availability Group

By default, the Add Database Wizard and New Availability Group Wizard for AlwaysOn Availability Groups do not support databases that are already encrypted: see Encrypted Databases with AlwaysOn Availability Groups (SQL Server).

If you have a database that is already encrypted, it can be added to an existing Availability Group – just not through the wizard. You’ll need to follow the procedures outlined in Manually Prepare a Secondary Database for an Availability Group.

This article discusses how TDE encryption can be enabled for a database that already belongs to an Availability Group. After a database is already a member of an Availability Group, the database can be configured for TDE encryption but there are some key steps to do in order to avoid errors.

To follow the procedures outlined in this article you need:

An AlwaysOn Availability Group with at least one Primary and one Secondary replica defined.
At least one database in the Availability Group.
A Database Master Key on all replica servers (primary and secondary servers)
A Server Certificate installed on all replica instances (primary and all secondary replicas).

For this configuration, there are two servers:

SQL1 – the primary replica instance, and
SQL2 – the secondary replica instance.

Step One: Verify each replica instance has a Database Master Key (DMK) in Master – if not, create one.

To determine if an instance has a DMK, issue the following query:

USE MASTERGOSELECT * FROMsys.symmetric_keysWHERE name = '##MS_DatabaseMasterKey##'

If a record is returned, then a DMK exists and you do not need to create one, but if not, then one will need to be created. To create a DMK, issue the following TSQL on each replica instance that does not have a DMK already:

CREATE MASTERKEY ENCRYPTIONBY PASSWORD = 'Mhl(9Iy^4jn8hYx#e9%ThXWo*9k6o@';

Notes:

If you query the sys.symmetric_keys without a filter, you will notice there may also exist a “Service Master Key” named: ##MS_ServiceMasterKey##. The Service Master Key is the root of the SQL Server encryption hierarchy. It is generated automatically the first time it is needed to encrypt another key. By default, the Service Master Key is encrypted using the Windows data protection API and using the local machine key. The Service Master Key can only be opened by the Windows service account under which it was created or by a principal with access to both the service account name and its password. For more information regarding the Service Master Key (SMK), please refer to the following article: Service Master Key. We will not need to concern ourselves with the SMK in this article.
If the DMK already exists and you do not know the password, that is okay as long as the service account that runs SQL Server has SA permissions and can open the key when it needs it (default behavior). For more information refer to the reference articles at the end of this post.
You do not need to have the exact same database master key on each SQL instance. In other words, you do not need to back up the DMK from the primary and restore it onto the secondary. As long as each secondary has a DMK then that instance is prepared for the server certificate(s).
If your instances do not have DMKs and you are creating them, you do not need to have the same password on each instance. The TSQL command, CREATE MASTER KEY, can be used on each instance independently with a separate password. The same password can be used, but the key itself will still be different due to how our key generation is done.
The DMK itself is not used to encrypt databases – it is used simply to encrypt certificates and other keys in order to keep them protected. Having different DMKs on each instance will not cause any encryption / decryption problems as a result of being different keys.

Step Two: Create a Server Certificate on the primary replica instance.

To have a Database Encryption Key (DEK) that will be used to enable TDE on a given database, it must be protected by a Server Certificate. To create a Server Certificate issue the following TSQL command on the primary replica instance (SQL1):

USE MASTERGOCREATE CERTIFICATE TDE_DB_EncryptionCertWITH SUBJECT = 'TDE Certificate for the TDE_DB database'

To validate that the certificate was created, you can issue the following query:

SELECT name, pvt_key_encryption_type_desc, thumbprint FROMsys.certificates

which should return a result set similar to:

The thumbprint will be useful because when a database is encrypted, it will indicate the thumbprint of the certificate used to encrypt the Database Encryption Key. A single certificate can be used to encrypt more than one Database Encryption Key, but there can also be many certificates on a server, so the thumbprint will identify which server certificate is needed.

Step Three: Back up the Server Certificate on the primary replica instance.

Once the server certificate has been created, it should be backed up using the BACKUP CERTIFICATE TSQL command (on SQL1):

USE MASTERBACKUP CERTIFICATE TDE_DB_EncryptionCertTOFILE = 'TDE_DB_EncryptionCert'WITH PRIVATEKEY (FILE = 'TDE_DB_PrivateFile',ENCRYPTION BY PASSWORD = 't2OU4M01&iO0748q*m$4qpZi184WV487')

The BACKUP CERTIFICATE command will create two files. The first file is the server certificate itself. The second file is a “private key” file, protected by a password. Both files and the password will be used to restore the certificate onto other instances.

When specifying the filenames for both the server certificate and the private key file, a path can be specified along with the filename. If a path is not specified with the files, the file location where Microsoft SQL Server will save the two files is the default “data” location for databases defined for the instance. For example, on the instance used in this example, the default data path for databases is “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA”.

Note:

If the server certificate has been previously backed up and the password for the private key file is not known, there is no need to panic. Simply create a new backup by issuing the BACKUP CERTIFICATE command and specify a new password. The new password will work with the newly created files (the server certificate file and the private key file).

Step Four: Create the Server Certificate on each secondary replica instance using the files created in Step 3.

The previous TSQL command created two files – the server certificate (in this example: “TDE_DB_EncryptionCert”) and the private key file (in this example: “TDE_DB_PrivateFile”) The second file being protected by a password.

These two files along with the password should then be used to create the same server certificate on the other secondary replica instances.

After copying the files to SQL2, connect to a query window on SQL2 and issue the following TSQL command:

CREATE CERTIFICATE TDE_DB_EncryptionCertFROMFILE = '<path_where_copied>\TDE_DB_EncryptionCert'WITH PRIVATEKEY
(   FILE = '<path_where_copied>\TDE_DB_PrivateFile',DECRYPTIONBY PASSWORD = 't2OU4M01&iO0748q*m$4qpZi184WV487')

This installs the server certificate on SQL2. Once the server certificate is installed on all secondary replica instances, then we are ready to proceed with encrypting the database on the primary replica instance (SQL1).

Step Five: Create the Database Encryption Key on the Primary Replica Instance.

On the primary replica instance (SQL1) issue the following TSQL command to create the Database Encryption Key.

USE TDE_DB2goCREATEDATABASE ENCRYPTION KEYWITH ALGORITHM = AES_256ENCRYPTIONBY SERVER CERTIFICATE TDE_DB_EncryptionCert

The DEK is the actual key that does the encryption and decryption of the database. When this key is not in use, it is protected by the server certificate (above). That is why the server certificate must be installed on each of the instances. Because this is done inside the database itself, it will be replicated to all of the secondary replicas and the TSQL does not need to be executed again on each of the secondary replicas.

At this point the database is NOT YET encrypted – but the thumbprint identifying the server certificate used to create the DEK has been associated with this database. If you run the following query on the primary or any of the secondary replicas, you will see a similar result as shown below:

SELECT db_name(database_id), encryption_state, 
    encryptor_thumbprint, encryptor_type FROMsys.dm_database_encryption_keys

Notice that TempDB is encrypted and that the same thumbprint (i.e. Server Certificate) was used to protect the DEK for two different databases. The encryption state of TDE_DB2 is 1, meaning that it is not encrypted yet.

Step Six: Turn on Database Encryption on the Primary Replica Instance (SQL1)

We are now ready to turn on encryption. The database itself as a database encryption key (DEK) that is protected by the Server Certificate. The server certificate has been installed on all replica instances. The server certificate itself is protected by the Database Master Key (DMK) which has been created on all of the replica instances. At this point each of the secondary instances is capable of decrypting (or encrypting) the database, so as soon as we turn on encryption on the primary, the secondary replica copies will begin encrypting too.

To turn on TDE database encryption, issue the following TSQL command on the primary replica instance (SQL1):

ALTERDATABASE TDE_DB2 SET ENCRYPTION ON

To determine the status of the encryption process, again query sys.dm_database_encryption_keys :

SELECT db_name(database_id), encryption_state, 
    encryptor_thumbprint, encryptor_type, percent_completeFROM sys.dm_database_encryption_keys

When the encryption_state = 3, then the database is encrypted. It will show a status of 2 while the encryption is still taking place, and the percent_complete will show the progress while it is still encrypting. If the encryption is already completed, the percent_complete will be 0.

At this point, you should be able to fail over the Availability Group to any secondary replica and be able to access the database without issue.

What happens if I turn on encryption on the primary replica but the server certificate is not on the secondary replica instance?

The database will quit synchronizing and possibly report “suspect” on the secondary. This is because when the SQL engine opens the files and begins to read the file, the pages inside the file are still encrypted. It does not have the decryption key to decrypt the pages. The SQL engine will think the pages are corrupted and report the database as suspect. You can confirm this is the case by looking in the error log on the secondary. You will see error messages similar to the following:

2014-01-28 16:09:51.42 spid39s Error: 33111, Severity: 16, State: 3.
2014-01-28 16:09:51.42 spid39s Cannot find server certificate with thumbprint '0x48CE37CDA7C99E7A13A9B0ED86BB12AED0448209'.
2014-01-28 16:09:51.45 spid39s AlwaysOn Availability Groups data movement for database 'TDE_DB2' has been suspended for the following reason: "system" (Source ID 2; Source string: 'SUSPEND_FROM_REDO'). To resume data movement on the database, you will need to resume the database manually. For information about how to resume an availability database, see SQL Server Books Online.
2014-01-28 16:09:51.56 spid39s Error: 3313, Severity: 21, State: 2.
2014-01-28 16:09:51.56 spid39s During redoing of a logged operation in database 'TDE_DB2', an error occurred at log record ID (31:291:1). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.

The error messages are quite clear that the SQL engine is missing a certificate – and it’s looking for a specific certificate – as identified by the thumbprint. If there is more than one server certificate on the primary, then the one that needs to be installed on the secondary is the one whose thumbprint matches the thumbprint in the error message.

The way to resolve this situation is to go back to step three above and the back up the certificate from SQL1 (whose thumbprint matches) and then create the server certificate on SQL2 as outlined in step four. Once the server certificate exists on the secondary replica instance (SQL2), then you can issue the following TSQL command on the secondary (SQL2) to resume synchronization:

ALTERDATABASE TDE_DB2 SET HADR RESUME

References

Encrypted Databases with AlwaysOn Availability Groups (SQL Server)

Manually Prepare a Secondary Database for an Availability Group

```
Transparent Data Encryption (TDE)
```

Move a TDE Protected Database to Another SQL Server

SQL Server and Database Encryption Keys (Database Engine)

DMV - sys.dm_database_encryption_keys (Transact-SQL)

SQL Server Certificates and Asymmetric Keys

↧

Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

December 1, 2015, 2:09 pm

≫ Next: Enhance AlwaysOn Failover Policy to Test SQL Server Data and Log Drives

≪ Previous: How to enable TDE Encryption on a database in an Availability Group

SQL Client Provider Behavior With MultiSubnet Listener Results in Connection Timeouts

The first experience trying to connect to an availability group listener defined with multiple IP addresses, may be intermittent connection timeouts.

The default behavior of the SQL client libraries is to try all IP addresses returned by the DNS lookup - one after another (serially) until the all of the IP addresses have been exhausted and either a connection is made, or a connection timeout threshold has been reached. This can be problematic, because depending upon DNS configurations, the “correct” or “online” IP address may not be the first IP address returned. The default timeout for a TCP connection attempt is 21 seconds and if the first IP address attempted is not online, it will wait 21 seconds before attempting the next IP address. For each subsequent IP address, it will again have to wait 21 seconds before moving to the next IP address until the connection attempt times out or it establishes a connection to an IP address that responds.

For more information on the symptoms associated with the SQL Client’s legacy default behavior (MultiSubnetFailover=FALSE) see our other AlwaysOnPro article:

Connection Timeouts in Multi-subnet Availability Group

Improved SQL Client Provider Defaults to MultiSubnetFailover=TRUE

That changes with the updated SQL Client provider shipping in .NET 4.6.1. Now SQL Client Provider's default behavior is to retrieve all IP addresses up front and attempt to connect to them all in parallel. This should result in the successful connection to the online IP addresss and is the optimal way to reconnect to an availability group in the event of failover.

Multi-Subnet Architecture with MultiSubnetFailover

The following article describes the improvement in more detail and a download link to the updated .NET package:

.NET Framework 4.6.1 is now available!

Improve MultisubnetFailover connection behavior for AlwaysOn
The SqlClient now automatically provides faster connection to AlwaysOn Availability Group that was introduced in SQL Server 2012. It transparently detects whether your application is connecting to an AlwaysOn availability group (AG) on different subnet and quickly discovers the current active server and provides connection to the server.
Prior to this release, an application has to set connection string to include “MultisubnetFailover=true” to indicate that it is connecting to AlwaysOn Availability Group. Without turning on the connection keyword, an application might experience a timeout while connecting to AlwaysOn Availability Group.
With this feature an application does not need to set MultisubnetFailover to true anymore. For more information about SqlClient support for AlwaysOn Availability Groups, see SqlClient Support for High Availability, Disaster Recovery.

For more information on SQL Client Provider support for High Availability and specific support for the MultiSubnetFailover connection string parameter see:

SqlClient Support for High Availability, Disaster Recovery

↧

Enhance AlwaysOn Failover Policy to Test SQL Server Data and Log Drives

December 7, 2015, 11:43 am

≫ Next: Enhance AlwaysOn Failover Policy to Test SQL Server Database Data and Log Drives

≪ Previous: Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

In SQL Server 2012 and 2014, AlwaysOn health diagnostics detect the health of the SQL Server process in several ways. However, no health detection is performed on the accessibility or viability of the databases defined in AlwaysOn availability groups. If the disk hosting the availability group database or log files is lost, AlwaysOn health does not detect this event, and application runtime errors accessing the database will ensue. Loss of the drive or errors accessing the drive that host your availability database data and log files will affect access to your production data.

As per the Flexible Failover Policy for Automatic Failover of an Availability Group (SQL Server)

Damaged databases and suspect databases are not detected by any failure-condition level. Therefore, a database that is damaged or suspect (whether due to a hardware failure, data corruption, or other issue) never triggers an automatic failover.

NOTE SQL Server 2016 enhances the AlwaysOn health diagnostics with database health detection. If AlwaysOn availability group database health detection has been selected for your availability group and an availability database transitions out of the ONLINE state (sys.databases.state_desc), the entire availability group will failover automatically if configured to do so. For more information see MSDN topic 'CREATE AVAILABILITY GROUP' and section on ‘DB_FAILOVER’

CREATE AVAILABILITY GROUP (Transact-SQL)

You can enhance the availability of your databases by detecting disk health. Add a generic script resource to your availability group resource group that does basic read or write tests against the drives hosting your availability group database data and log files. The following describes how to add a generic script resource as a dependency to your availability group resource to enhance AlwaysOn health detection with a basic disk health check.

Use a generic script resource to do basic health check on SQL Server data and log drives

Here is the high level description of the implementation of the generic script resource to detect availability group database drive health.

Add a generic script clustered resource to the availability group resource group. Make the availability group resource dependent on the generic resource script. That way, if the script resource reports IsAlive failure, Windows Cluster will attempt to restart or failover the availability group resource if configured to do so.

The generic script clustered resource IsAlive test creates a text file in the specified Data drive location and a text file in the specified Log drive location. If the file exists, the script will overwrite it.

Attached to this blog is a zipped file, GenericScript_SQLIsAlive.zip, containing

sqlisalive.vbs<- The generic script written in Visual Basic Script and implements Windows Cluster IsAlive
Add_SQLIsAliveScript.ps1<- The PowerShell script which adds the generic script resource to your availability group resource group and sets the availability group dependency on your generic script resource
readme.txt<- Step by step instructions for implementing the generic script resource and additional instructions on how to test the script.

Implement the generic script resource

I Configure the generic script, sqlisalive.vbs

Data and Log Drive Paths Currently, the generic script is configured to test the following drive and paths: c:\temp\data and c:\temp\log. For testing purposes, create these paths on the local drive of each replica (primary and automatic failover partner secondary). Later, you can change them to the appropriate drive and paths where your database data and log files live.

DataDriveFile="c:\temp\data\ScriptFileData.txt"
LogDriveFile="c:\temp\log\ScriptFileLog.txt"

II Configure and execute the Powershell script to deploy the generic script to your availability group

NOTE This generic script only implements IsAlive which runs every 60 seconds.

1 Ensure your availability group has two replicas configured for automatic failover.

2 Copy the generic script file to an identical local storage location like 'C:\temp\sqlisalive.vbs' on both servers whose replicas are configured for automatic failover.

3 Create the paths for the health check, c:\temp\data and c:\temp\log.

4 The Add_SQLIsAliveScript.ps1 PowerShell script adds the generic script resource to your availability group and adds a dependency to your availability group resource, on the generic script resource. In the Add_SQLIsAliveScript.ps1, change the following variables.

     Set $ag to your availability group name
     Set $listener to your availability group listener name. If your availabiltiy group does not have a listener, set $listener to ""
     Set $scriptfilepath to the path and file name of your sqlisalive.vbs script

5 On the server hosting the primary replica, run the PowerShell script Add_SQLIsAliveScript.ps1 to add the generic script resource to your availability group resource group.

6 Launch Failover Cluster Manager and review the availability group resource group to confirm addition of the generic script resource to the availability group resource group. The generic script should appear and come online in the availability group resource group under the Resources tab.

7 Confirm that the dependency has been created in the availability group resource, on the generic script resource.

NOTES

The attached readme.txt file has instructions on how to test the script resource to ensure that it can failover your availability group resource.

Diagnose failure detection by generic script resource

Generate the cluster log for the node hosting the primary replica and search for 'Data Drive Create File' or 'Log Drive Create File' - to locate success or failure report of the generic script resource IsAlive:

00001b04.00002924::2015/12/07-17:16:41.798 INFO [RES] Generic Script <sqlisalive>: Entering IsAlive
00001b04.00002924::2015/12/07-17:16:41.801 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Succeeded
00001b04.00002924::2015/12/07-17:16:41.801 INFO [RES] Generic Script <sqlisalive>: Log Drive Create File Succeeded

Or for example, you set the \Data folder to read only:

00001b04.00002924::2015/12/07-17:17:41.801 INFO [RES] Generic Script <sqlisalive>: Entering IsAlive
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Succeeded
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Failed
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Permission denied
00001b04.00002924::2015/12/07-17:17:41.804 ERR [RES] Generic Script <sqlisalive>: 'IsAlive' script entry point returned FALSE.'
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Return value of 'IsAlive' script entry point caused HRESULT to be set to 0x00000001.
00001b04.00002298::2015/12/07-17:17:41.804 WARN [RHS] Resource sqlisalive IsAlive has indicated failure.

↧

Enhance AlwaysOn Failover Policy to Test SQL Server Database Data and Log Drives

January 14, 2016, 8:07 am

≫ Next: Troubleshooting Availability Group Listener in Azure

≪ Previous: Enhance AlwaysOn Failover Policy to Test SQL Server Data and Log Drives

As per the Flexible Failover Policy for Automatic Failover of an Availability Group (SQL Server)

Damaged databases and suspect databases are not detected by any failure-condition level. Therefore, a database that is damaged or suspect (whether due to a hardware failure, data corruption, or other issue) never triggers an automatic failover.

CREATE AVAILABILITY GROUP (Transact-SQL)

Use a generic script resource to do basic health check on SQL Server data and log drives

Here is the high level description of the implementation of the generic script resource to detect availability group database drive health.

Attached to this blog is a zipped file, GenericScript_SQLIsAlive.zip, containing

sqlisalive.vbs<- The generic script written in Visual Basic Script and implements Windows Cluster IsAlive
Add_SQLIsAliveScript.ps1<- The PowerShell script which adds the generic script resource to your availability group resource group and sets the availability group dependency on your generic script resource
readme.txt<- Step by step instructions for implementing the generic script resource and additional instructions on how to test the script.

Implement the generic script resource

I Configure the generic script, sqlisalive.vbs

DataDriveFile="c:\temp\data\ScriptFileData.txt"
LogDriveFile="c:\temp\log\ScriptFileLog.txt"

II Configure and execute the Powershell script to deploy the generic script to your availability group

NOTE This generic script only implements IsAlive which runs every 60 seconds.

1 Ensure your availability group has two replicas configured for automatic failover.

2 Copy the generic script file to an identical local storage location like 'C:\temp\sqlisalive.vbs' on both servers whose replicas are configured for automatic failover.

3 Create the paths for the health check, c:\temp\data and c:\temp\log.

5 On the server hosting the primary replica, run the PowerShell script Add_SQLIsAliveScript.ps1 to add the generic script resource to your availability group resource group.

7 Confirm that the dependency has been created in the availability group resource, on the generic script resource.

NOTES

The attached readme.txt file has instructions on how to test the script resource to ensure that it can failover your availability group resource.

Diagnose failure detection by generic script resource

00001b04.00002924::2015/12/07-17:16:41.798 INFO [RES] Generic Script <sqlisalive>: Entering IsAlive
00001b04.00002924::2015/12/07-17:16:41.801 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Succeeded
00001b04.00002924::2015/12/07-17:16:41.801 INFO [RES] Generic Script <sqlisalive>: Log Drive Create File Succeeded

Or for example, you set the \Data folder to read only:

00001b04.00002924::2015/12/07-17:17:41.801 INFO [RES] Generic Script <sqlisalive>: Entering IsAlive
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Succeeded
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Data Drive Create File Failed
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Permission denied
00001b04.00002924::2015/12/07-17:17:41.804 ERR [RES] Generic Script <sqlisalive>: 'IsAlive' script entry point returned FALSE.'
00001b04.00002924::2015/12/07-17:17:41.804 INFO [RES] Generic Script <sqlisalive>: Return value of 'IsAlive' script entry point caused HRESULT to be set to 0x00000001.
00001b04.00002298::2015/12/07-17:17:41.804 WARN [RHS] Resource sqlisalive IsAlive has indicated failure.

↧

Troubleshooting Availability Group Listener in Azure

February 1, 2016, 12:19 pm

≫ Next: How to add a TDE encrypted database to an Availability Group

≪ Previous: Enhance AlwaysOn Failover Policy to Test SQL Server Database Data and Log Drives

Configuring an availability group listener in Azure is much more complex than doing it on-premises, due to the some limitations of Azure networking. This topic helps you troubleshoot your availability group listener, whether your AlwaysOn Availability Groups deployment is in Azure only or in a hybrid IT environment using a site-to-site VPN.

Some steps in the listener configuration involve configuration of Azure itself, such as the load-balanced virtual machine (VM) endpoint and direct server return. However, Azure currently does not provide any tools to help you verify your configuration is working as expected. Therefore, you need a network analyzer to help you verify your configuration as well as troubleshoot any problem. This topic shows you how to use Microsoft Network Monitor to troubleshoot your availability group listener.

Availability Group Listener Configuration Summary

This section provides a list of configuration options to check while troubleshooting your availability group listener.

Load-balanced Endpoint (Configured in Azure)	Configuration inside VMs	Configuration of Client Connectivity
Configured on all VMs that are availability replicas. Public port and local port should be the same. The probe port is used by the Azure load balancer to determine which server is the primary replica Direct Server Return (DSR) is set to true on the VM load-balanced endpoint.	Configured on all cluster nodes that host availability. Configured on all VMs that are availability replicas: Open probe port in the firewall Open listener port in the firewall Configured on the computer or VM with the primary replica (in hybrid IT, the primary replica should be on-premises) Create client access point for the availability group cluster service Configure IP address resource with cloud service IP, cluster network name, and probe port Configure dependency from availability group resource to Listener resource Specify listener port in SQL Server Management Studio	If client is on Azure VM, place VM in a diferent cloud service For clients in the same Active Directory domain, connect to configured listener name and port number For clients outside of Azure, configure the login timeout to accommodate for network latencies.

Verify Probe Pings in Availability Group Listener Configuration

Note: Azure Load Balancers don’t support the ping (ICMP) protocol.

To determine whether the probe port on the load-balanced endpoint is working properly on the Azure VMs, you use Network Monitor to flter your packet capture on the probe port.

When the load-balanced endpoint is configured properly, the Azure load balancer continuously pings each of the VMs to determine if it has the primary replica so that it can route client connections to the correct VM. If the VM is the primary replica, the cluster service is configured with the probe port and responds to the probe pings. This traffic can be seen in Network Monitor by performing packet capture with the following flter applied in the Display Filter pane:

TCP.DstPort == 59999 OR TCP.SrcPort == 59999

The first clause captures the incoming pings from the Azure load balancer and the second clause captures the reply by the primary replica. The screenshot below shows what it looks like when you do a packet capture on the primary replica in Azure.

The Con Id column shows you packets related to the same probe ping (shown in yellow). The response message from your VM acknowledges each ping message by incrementing the Seq value in the Ack value. You can also see the load balancer’s source IP address (shown in orange).

If you do not see the response messages from the VM, but SynReTransmit messages instead from the load balancer, the VM is not responding to the load balancer, which can mean that it is not the primary replica or that the probe ping are not working as expected.

Verify Listener Connectivity in Availability Group Listener Configuration

To determine whether the availability group listener is working properly on the Azure VMs, you use Network Monitor to filter your packet capture on the listener port.

When the client tries to connect to the availability group listener using the cloud service’s IP address and the listener port, Azure verifies that the connection port is the same as the one configured in the load-balanced endpoint and lets the TCP connection through to the primary replica, which has been responding to the probe pings. If the VM’s firewall has a corresponding rule, the client access point is configured properly, and the listener port is configured in your availability group, the availability group listener accepts the connection and the client can perform updates and queries. This traffic can be seen in Network Monitor by performing packet capture with the following filter applied in the Display Filter pane (assuming that the listener port is 10000):

TCP.DstPort == 10000 OR TCP.SrcPort == 10000

The screenshot below shows what it looks like when you successfully connect to the availability group listener in Azure and perform a simple query.

The Con Id column shows you packets related to the same client connection (shown in yellow). In the packets sent from the VM, you can see information regarding the client’s hostname, domain, and username (shown in red). You can also see the client’s IP address on the internet (shown in orange). In this case, it is a client VM in a different cloud service, so the IP address shown is the client VM’s cloud service IP address.

Troubleshoot Availability Group Listener Configuration in Azure

The table below lists some of the common symptoms when troubleshooting availability group listeners in Azure, and possible causes for each symptom.

Tip: The Windows Ping.exe command does not work on the availability group listener in Azure. The load-balanced endpoint only accepts TCP connections, while Ping.exe uses ICMP.

Symptom	Possible Cause	Comment
No traffic on probe port (59999)	Load-balanced endpoint is not configured Probe port is not configured for load-balanced endpoint Firewall on VM is not opened for probe port
Probe port receives pings and SynReTransmit packets, but no replies	Current VM is not primary replica IP address resource in client access point is not configured with probe port, or a different probe port is specified in IP address resource than in load- balanced endpoint IP address resource in client access point is offline	This symptom indicates that the probe port is configured properly in the load-balanced endpoint and that the VM firewall has allowed the incoming packet. To test whether the clustered service is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for rhs.exe in the list.
No traffic on listener port	DSR in load-balanced endpoint not set to true Public and local ports on load- balanced endpoint are different (not supported) A network access control list (ACL) is configured on the load-balanced endpoint, but the client’s public IP address is not allowed or not part of an allowed range. Client is not using port number in connection string or is using a different port number Firewall on VM is not opened for listener port	The listener port should match the public/local port specified on the load-balanced endpoint.
Listener port receives incoming traffic and SynReTransmit packets, but no replies	Cluster’s client access point is configured with incorrect dependencies IP address resource in client access point contains incorrect cluster network name (“Cluster Network <#>” by default) Listener is not configured with a port number or is configured with an incorrect port number in SQL Server Management Studio.	This symptom indicates that the load-balanced endpoint is configured properly and that the Azure load balancer has successfully routed the client’s connection request to the primary replica, but no listener is actively listening on that port. To test whether the listener is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for sqlservr.exe. The common mistake in configuring dependencies for the client access point is to set the availability group resource to depend on the IP address resource(s). Instead, you should configure the listener name to depend on the IP address(s) and configure the availability group resource to depend on the listener name.
Listener only accessible from the primary replica node itself	Client connection recognized local server as availability group resource owner and bypassed load-balanced endpoint entirely Client does not reside in a separate cloud service (not supported)
Client lost connectivity to listener after failover	Firewall on new primary replica is not open for probe port (Azure load balancer cannot find new primary replica) or for listener port (client connection refused by firewall) Client does not reside in a separate cloud service (not supported) An availability group resource is offline
All IP address resources in client access point are offline, but listener name is online	Listener name is not configured to depend on IP address resources
Listener name is offline, but availability group resource is online	Availability group resource is not configured to depend on listener name
At least one IP address is online, but listener name is offline	Listener name is not set to OR for all IP addresses.

↧

How to add a TDE encrypted database to an Availability Group

January 7, 2015, 4:16 am

≫ Next: Recommendations for Index Maintenance with AlwaysOn Availability Groups

≪ Previous: Troubleshooting Availability Group Listener in Azure

If you have a database that is already encrypted, it can be added to an existing Availability Group – just not through the wizard. This article provides the steps necessary to successfully add a TDE encrypted database to an AlwaysOn Availability Group.

This scenario has two instances – SQL1 (the AG primary replica instance) and SQL2 (secondary replica instance)

To following pre-requisites are needed:

An existing AlwaysOn Availability Group with at least one Primary and one Secondary replica instance.
A TDE encrypted database on the same instance as the primary replica, online and accessible.
A Database Master Key on all replica servers hosting the availability group (the primary will already have one since it has a TDE encrypted database).

The following actions will be done while adding the TDE encrypted database to the availability group.

Verify each secondary replica instance has a Database Master Key (DMK) in the master DB (create a new one if missing)
On the primary replica instance, create a backup of the certificate used to TDE encrypt the database.
On each secondary replica instance, create the TDE Certificate from the certificate backed up on the primary.
On the primary replica instance, create a full database backup of the TDE encrypted database.
On the primary replica instance, create a transaction log backup of the TDE encrypted database.
On the primary replica instance, add the TDE encrypted database to the Availability Group.
On each secondary replica instance, restore the full backup (with no recovery).
On each secondary replica instance, restore the transaction log backup (with no recovery).
On each secondary replica instance, join the database to the availability group.

Step One: Verify each secondary replica instance has a Database Master Key (DMK) in the master database – if not, create one.

To determine if an instance has a DMK, issue the following query:

USE MASTER GO SELECT * FROM
sys.symmetric_keys
WHERE name = ‘##MS_DatabaseMasterKey##’

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘Mhl(9Iy^4jn8hYx#e9%ThXWo*9k6o@’;

Notes – PLEASE READ:

If you query the sys.symmetric_keys without a filter, you will notice there may also exist a “Service Master Key” named: ##MS_ServiceMasterKey##. The Service Master Key is the root of the SQL Server encryption hierarchy. It is generated automatically the first time it is needed to encrypt another key. By default, the Service Master Key is encrypted using the Windows data protection API and using the local machine key. The Service Master Key can only be opened by the Windows service account under which it was created or by a principal with access to both the service account name and its password. For more information regarding the Service Master Key (SMK), please refer to the following article: Service Master Key. We will not need to concern ourselves with the SMK in this article.
If the DMK already exists and you do not know the password, that is okay as long as the service account that runs SQL Server has SA permissions and can open the key when it needs it (default behavior). For more information refer to the reference articles at the end of this blog post.
You do not need to have the exact same database master key on each SQL instance. In other words, you do not need to back up the DMK from the primary and restore it onto the secondary. As long as each secondary has a DMK then that instance is prepared for the server certificate(s).
If your instances do not have DMKs and you are creating them, you do not need to have the same password on each instance. The TSQL command, CREATE MASTER KEY, can be used on each instance independently with a separate password. The same password can be used, but the key itself will still be different due to how our key generation is done.
The DMK itself is not used to encrypt databases – it is used simply to encrypt certificates and other keys in order to keep them protected. Having different DMKs on each instance will not cause any encryption / decryption problems as a result of being different keys.
For more information regarding Transparent Data Encryption (TDE) & Database Master Keys (DMK) see: Transparent Data Encryption (TDE)

Step Two: On the primary replica instance, create a backup of the certificate used to TDE encrypt the database

To decrypt the TDE encrypted database on a secondary replica instance, it will have to have a copy of the certificate on the primary that is used to encrypt the TDE encrypted database. It is possible there is more than one certificate installed on the primary replica instance. To know which certificate to backup, run the following query (on SQL1) and find the certificate name next to the database you wish to add to the availability group:

USE master
GO
SELECT db_name(database_id) [TDE Encrypted DB Name], c.name as CertName, encryptor_thumbprint
FROM sys.dm_database_encryption_keys dek
INNER JOIN sys.certificates c on dek.encryptor_thumbprint = c.thumbprint

It should give a result set similar to the following:

Now backup the certificate using the TSQL command BACKUP CERTIFICATE (on SQL1):

USE MASTER BACKUP CERTIFICATE [TDE_DB_EncryptionCert] TO FILE = ‘TDE_DB_EncryptionCert’ WITH PRIVATE KEY (FILE = ‘TDE_DB_PrivateFile’, ENCRYPTION BY PASSWORD = ‘t2OU4M01&iO0748q*m$4qpZi184WV487′)

When backing up the certificate, if no path is provided the certificate and private key files are saved to the default ‘data’ SQL Server database location defined for the instance. For example, on the instance used in this example, the default data path for databases is “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA”.

Note:

Step Three: On each secondary replica instance, create the TDE Certificate from the certificate backed up on the primary

Step Two created a backup of the TDE certificate. This step will use that backup to “re-create” or “restore” the certificate on each of the secondary replica instances. The “backup” consists of two files – the server certificate (in this example: “TDE_DB_EncryptionCert”) and the private key file (in this example: “TDE_DB_PrivateFile”) The second file being protected by a password.

These two files along with the password should then be used with the TSQL command CREATE CERTIFICATE to re-create the same server certificate on the other secondary replica instances.

After copying the files to SQL2, connect to a query window on SQL2 and issue the following TSQL command:

CREATE CERTIFICATE [TDE_DB_EncryptionCert] FROM FILE = ‘<path_where_copied>\TDE_DB_EncryptionCert’ WITH PRIVATE KEY ( FILE = ‘<path_where_copied>\TDE_DB_PrivateFile’, DECRYPTION BY PASSWORD = ‘t2OU4M01&iO0748q*m$4qpZi184WV487′)

This installs the server certificate on SQL2.

Repeat as necessary if there are more secondary replica instances. This certificate must exist on all replicas (primary and all secondary replica instances).

Step Four: On the primary replica instance (SQL1), create a full database backup of the TDE encrypted database

TSQL or the SSMS GUI can both be used to create a full backup. Example:

USE master go BACKUP DATABASE TDE_DB TO DISK = ‘SOME path\TDEDB_full.bak';

For more information, please review: How to: Create a Full Database Backup (Transact-SQL)

Step Five: On the primary replica instance (SQL1), create a transaction log backup of the TDE encrypted database

TSQL or the SSMS GUI can both be used to create a transaction log backup. Example:

USE master go BACKUP LOG TDE_DB TO DISK = ‘SOME path\TDEDB_log.trn';

For more information, please review: How to: Create a Transaction Log Backup (Transact-SQL)

Step Six: On the primary replica instance (SQL1), add the TDE encrypted database to the Availability Group

On the primary (SQL1 in this example) for the availability group (AG_Name), issue an ALTER AVAILABILTY GROUP command:

USE master go ALTER AVAILABILITY GROUP [AG_Name] ADD DATABASE [TDE_DB]

This will add the encrypted database to the primary replica for the availability group called: AG_Name.

Step Seven: On each secondary replica instance, restore the full backup (from Step Four) with no recovery

TSQL or the SSMS GUI can both be used to restore a full backup. Please be sure to specify “NO RECOVERY” so that the transaction log backup can also be restored: Example:

USE master go RESTORE DATABASE TDE_DB FROM DISK = ‘SOME path\TDEDB_full.bak’ WITH NORECOVERY;

For more information refer to the TSQL RESTORE command.

Repeat step seven for all secondary replica instances if there are more than one,

Step Eight: On each secondary replica instance, restore the transaction log backup (from Step Five) with no recovery

TSQL or the SSMS GUI can both be used to restore a log backup. Please be sure to specify “NO RECOVERY” so that the database remains in a “restoring” state and can be joined to the availability group. Example:

USE master go RESTORE LOG TDE_DB FROM DISK = ‘SOME path\TDEDB_log.trn’ WITH NORECOVERY;

For more information refer to the TSQL RESTORE command.

Repeat step eight for all secondary replica instances if there are more than one.

Step Nine: On each secondary replica instance, join the database to the availability group

On the secondary (SQL2), join the database to the availability group to begin synchronization by issuing the following TSQL statement:

USE master go ALTER DATABASE TDE_DB SET HADR AVAILABILITY GROUP = [AG_Name];

Repeat step nine on all secondary replica instances

References

Encrypted Databases with AlwaysOn Availability Groups (SQL Server)

Manually Prepare a Secondary Database for an Availability Group

```
Transparent Data Encryption (TDE)
```

Move a TDE Protected Database to Another SQL Server

SQL Server and Database Encryption Keys (Database Engine)

DMV - sys.dm_database_encryption_keys (Transact-SQL)

SQL Server Certificates and Asymmetric Keys

↧

Recommendations for Index Maintenance with AlwaysOn Availability Groups

March 3, 2015, 7:49 am

≫ Next: AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

≪ Previous: How to add a TDE encrypted database to an Availability Group

SYMPTOMS

Consider the following scenario

The database is part of AlwaysOn Availability Groups
You run long and log-intensive transactions like Index maintenance/rebuilds

You observe one or more of the following symptoms:

Poor performing DML operations in availability databases on the primary replica if synchronous secondary replicas are present.
Huge log send queue in the primary.
Considerable log growth in those availability databases in which the index rebuild occurs
Redo backlog in secondary replicas.

CAUSE

Log-intensive transactions like rebuilding indexes can cause the log file to grow significantly as the log cannot be truncated until redo has completed the changes in all secondary replicas.

MITIGATION

Mitigation Steps From Index Maintenance Perspective

The following strategies may reduce the contention index rebuilding has on your production environment:

Run index maintenance during off peak period if there is any.
Frequency of the maintenance should be optimized with the goal of minimizing impact of log generation. You can do maintenance in phases, like each phase doing a subset of indexes at a particular time, instead of considering all indexes at a single go.
Rebuild indexes based on true need / impact to your production environment. Article Script to appropriate rebuild/reorganize database indexes – SQL Server 2005 shows how to achieve following steps to optimize the index maintenance process -
- Ignore heaps and small tables
- Check the fragmentation level of each index
- If the fragmentation level is low (say less than 10%) – do nothing
- If the fragmentation level is medium (say in between 10 to 30%) – reorganize the index
- If the fragmentation level is high (say more than 30%) – rebuild the index
Use MAXDOP setting in the ALTER INDEX command to reduce the concurrent index alteration activity. Operations such as creating, rebuilding, or dropping indexes can be resource intensive and can cause insufficient resources for other applications and database operations for the duration of the index operation. When this problem occurs, you can manually configure the maximum number of processors that are used to run the index statement by limiting the number of processors to use for the index operation. Lower MADOP setting can also reduce fragmentation with online Index rebuild operation. Below is an example of ALTER INDEX REBUILD with MAXDOP =1:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX [PK_Employee_BusinessEntityID]
	ON [HumanResources].[Employee]
	REBUILD WITH (MAXDOP=1, ONLINE= ON);
	GO

Consider using table partitioning. This way you can rebuild portions of the index piece by piece by using ALTER INDEX…REBUILD PARTITION. SQL Server 2014 supports the ONLINE rebuild of partitioned indexes. Below is a sample script to rebuild only on partition 1:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX [IX_TransactionHistory_ProductID]
	ON [Production].[TransactionHistory]
	REBUILD PARTITION = 1
	WITH (ONLINE = ON);
	GO

For SQL server 2014 additional ALTER INDEX parameters like WAIT_AT_LOW_PRIORITY, MAX_DURATION, and ABORT_AFTER_WAIT can also be used. Following example rebuilds index with the ONLINE option including the low priority lock options:

	USE [AdventureWorks2014]
	GO
	ALTER INDEX ALL ON Production.Product
	REBUILD WITH
	(ONLINE = ON ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 4 	MINUTES, ABORT_AFTER_WAIT = BLOCKERS )));
	GO

Mitigation Steps From Availability Group Perspective

A busy secondary such as, secondary with resource bottleneck or a large reporting workload on the secondary replica, can slow down the performance of the secondary replica because of resource contention, and the redo thread can fall behind. The redo thread on the secondary replica can also be blocked from making data definition language (DDL) changes by a long-running read-only query. The diagnosis and mitigations steps for these issues are discussed in the article Troubleshoot: Availability Group Exceeded RTO.
You need to periodically check AG databases to make sure they do not have too many Virtual Log Files (VLFs) which can severely impact redo process of the secondary. Diagnosis and corrective steps are discussed in the article Too Many Virtual Log Files (VLFs) Can Cause Slow Database Recovery.
We recommend that you always switch to the High Performance power plan for all replica machines for all operating systems. In some circumstances this can have better performance for the single threaded redo process.

REFERENCES

↧

AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

March 6, 2015, 5:04 am

≫ Next: AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if ‘Primary DNS suffix’ is not set

≪ Previous: Recommendations for Index Maintenance with AlwaysOn Availability Groups

Availability Groups Simulate Database Mirroring Connection Behavior

Given the following scenario, SQL Server will return the failover partner server name to a connection request:

The availability group has a single secondary replica.
The availability group replicas have ALLOW_CONNECTIONS set to NO or READ_ONLY.
The client application makes a successful initial connection to the primary replica.
The application is not specifying the availability group listener when connecting.

SQL Server will return the failover partner server name (the SQL Server name hosting the secondary replica). The data access provider will cache the failover partner server name.

This behavior is by design. AlwaysOn availability groups are designed to be backward compatible with applications that expect legacy database mirroring connection behavior.

Workaround Database Mirroring Connection Behavior

Set the availability group Backup Preferences to Primary and then in the job at each replica, use the sys.fn_hadr_is_primary_replica to ensure execution only locally at the primary replica.
Qualify execution by querying sys.dm_hadr_availability_replica_states where is_local and role both return 1.
Add a third replica to the availability group.

Database Mirroring Connection Behavior

For more information on database mirroring connection behavior see:

Making the Initial Connection to a Database Mirroring Session

↧

AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if ‘Primary DNS suffix’ is not set

March 16, 2015, 1:30 am

≫ Next: Failing back from DR site after primary site is back online

≪ Previous: AlwaysOn Availability Group Returns Failover Partner for Legacy Mirror Application Connectivity

AlwaysOn availability groups will not function properly on Windows servers where the system’s ‘Primary DNS Suffix of this Computer’ is not set. Usually, the Primary DNS Suffix is populated when a Windows server joins a domain.

Create Listener on Windows server without ‘Primary DNS suffix’ fails

When issuing ALTER AVAILABILITY GROUP…ADD LISTENER, the message reads:

Msg 19471, Level 16, State 0, Line 1
The WSFC cluster could not bring the Network Name resource with DNS name ‘aglisten’ online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.
Msg 19476, Level 16, State 4, Line 1
The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

Reviewing the cluster log on the server where the listener failed to create, errors report that the fully qualified name could not be acquired:

00000280.00000060::2015/03/16-15:47:41.463 INFO [RCM] rcm::RcmGum::CreateResource(ag_aglisten,d2881a8e-7c36-4c8f-a17c-d4cef149e355,ag)
…
00000d70.00000a88::2015/03/16-15:47:46.081 INFO [RES] Network Name <ag_aglisten>: AccountAD: OU name for VCO is CN=Computers,DC=AGDC,DC=COM
…
00000d70.00000a88::2015/03/16-15:47:46.266 ERR   [RES] Network Name: [NNLIB] Getting FQDN failed with error 87
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name: [NN] IdentityLocal End Impersonating
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name <ag_aglisten>: AccountAD: OnInitializeEnd: 87
00000d70.00000a88::2015/03/16-15:47:46.266 INFO [RES] Network Name <ag_aglisten>: AccountAD: Slow Operation, FinishWithReply: 87
…
00000d70.00000a88::2015/03/16-15:47:46.282 INFO [RES] Network Name: Agent: OnInitializeReply, Failure on (d2881a8e-7c36-4c8f-a17c-d4cef149e355,Configuration): 87
00000d70.00000a88::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: SyncReplyHandler Configuration, result: 87
00000d70.00000464::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: PerformOnline – Initialization of Configuration module finished with result: 87
00000d70.00000464::2015/03/16-15:47:46.282 ERR   [RES] Network Name <ag_aglisten>: Online thread Failed: ERROR_SUCCESS(0)’ because of ‘Initializing netname configuration for ag_aglisten failed with error 87.’
00000d70.00000464::2015/03/16-15:47:46.282 INFO [RES] Network Name <ag_aglisten>: All resources offline. Cleaning up.
00000d70.00000464::2015/03/16-15:47:46.282 ERR   [RHS] Online for resource ag_aglisten failed.

Availability Group failover to Server without ‘Primary DNS suffix’ fails

Msg 41066, Level 16, State 0, Line 1
Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID ‘a8aa7c13-4b9c-4423-bc68-6e90a81d0c21′) online (Error code 5942). The WSFC service may not be running or may not be accessible in its current state, or the WSFC resource may not be in a state that could accept the request. For information about this error code, see “System Error Codes” in the Windows Development documentation.

Msg 41160, Level 16, State 0, Line 1
Failed to designate the local availability replica of availability group ‘ag’ as the primary replica. The operation encountered SQL Server error 41066 and has been terminated. Check the preceding error and the SQL Server error log for more details about the error and corrective actions.

Checking the cluster log on the Windows server which could not be failed over to, reports the following errors:

00000280.00000b08::2015/03/16-16:28:33.624 INFO [RCM] rcm::RcmApi::MoveGroup: (ag, 1, 0, MoveType::Manual )
00000280.00000db4::2015/03/16-16:28:33.670 INFO [NM] Received request from client address SQLNODE2.
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM] rcm::RcmApi::OnlineResource: (ag, 0)
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM-rbtr] giving default token to group ag
00000280.00000190::2015/03/16-16:28:33.670 INFO [RCM] rcm::RcmResource::Online: bringing ag’s provider resource ‘ag_aglisten’ online.
…
00000d70.00000c94::2015/03/16-16:28:33.686 WARN [RES] Network Name: [NNLIB] AddServerName – Getting Computer Domain failed, error 203
…
00000d70.00000f54::2015/03/16-16:28:33.827 INFO [RES] Network Name <ag_aglisten>: PerformOnline – Initialization of Configuration module finished with result: 203
00000d70.00000f54::2015/03/16-16:28:33.827 ERR [RES] Network Name <ag_aglisten>: Online thread Failed: ERROR_SUCCESS(0)’ because of ‘Initializing netname configuration for ag_aglisten failed with error 203.’
00000d70.00000f54::2015/03/16-16:28:33.827 INFO [RES] Network Name <ag_aglisten>: All resources offline. Cleaning up.
00000d70.00000f54::2015/03/16-16:28:33.827 ERR [RHS] Online for resource ag_aglisten failed.

Check the ‘Primary DNS suffix of this computer’ setting’

NOTE Setting the DNS suffix will require a reboot of the server.

Enter the correct suffix for the server:

For more information on diagnosing failed availability group listener creation

Create Listener Fails with Message ‘The WSFC cluster could not bring the Network Name resource online’

↧

Failing back from DR site after primary site is back online

April 27, 2015, 6:19 am

≫ Next: Replication Agents fail to connect to listener in a multisite cluster

≪ Previous: AlwaysOn Availability Group Listener Cannot be Created or Failed Over on Node if ‘Primary DNS suffix’ is not set

Assume steps similar to those from Manual Failover of Availability Group were used to move to the DR site.

After verifying the cluster is stable, change the voting back to the original setup, if it was changed while bringing the the cluster up on the DR site.

Add voting back to the nodes at primary site.

(Get-ClusterNode –Name "NodeName").NodeWeight=1

Remove voting from node at DR site.

(Get-ClusterNode –Name "NodeName").NodeWeight=0

Verify the current voting in the cluster.

Get-ClusterNode | fl Name,NodeWeight

Synchronize Original Secondary at primary site

Change the synchronization mode to asynchronous and start synchronization with the SQL Server instance that was the secondary at the primary site for each database in the availability group.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT

Alter database <dbname> SET HADR RESUME

Once the databases for the availability group are synchronized, then you can change the synchronization to synchronous, if it is not already set, and failover from the DR site to the primary site.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = SYNCHRONOUS_COMMIT

Alter Availability Group <agname> failover

After the failover to the secondary on the primary site, the synchronization can be changed to asynchronous.

Alter Availability Group <agname> Modify Replica On <SQLInstance_at_DR_Site> with AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT

Steps to put original primary in read only mode, if data recovery is needed

For SQL Server 2014:

The database should be in a read only state while it is in a suspended state.

After recovering any data from the original primary, then it can begin synchronization.

Alter database <dbname> SET HADR RESUME

For SQL Server 2012:

The database will need to be removed from the availability group on the original primary and restored with recovery.

Alter database <dbname> SET HADR AVAILABILITY GROUP OFF

restore database <dbname> with recovery

Synchronize Original Primary

Use the same steps for the original secondary at the primary site.

↧

Replication Agents fail to connect to listener in a multisite cluster

April 27, 2015, 6:19 am

≫ Next: Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

≪ Previous: Failing back from DR site after primary site is back online

1) Increase the login timeout

The agent will need to be changed to use the new profile, and then stopped and started for the change to take effect.

For more information on updating the timeout parameter of the replication agent, see the following link on working with replication agent profiles.

https://technet.microsoft.com/en-us/library/ms152515(v=sql.110).aspx

2) Change the RegisterAllProvidersIP for the listener from 1 to 0

https://msdn.microsoft.com/en-us/library/hh213080.aspx (RegisterAllProvidersIP Setting)

↧

Improved MultiSubnet Listener Behavior With Newly Released SQL Client Provider in .NET 4.6.1

December 1, 2015, 6:09 am

≫ Next: Enhance AlwaysOn Failover Policy to Test SQL Server Database Data and Log Drives

≪ Previous: Replication Agents fail to connect to listener in a multisite cluster

SQL Client Provider Behavior With MultiSubnet Listener Results in Connection Timeouts

The first experience trying to connect to an availability group listener defined with multiple IP addresses, may be intermittent connection timeouts.

The default behavior of the SQL client libraries is to try all IP addresses returned by the DNS lookup – one after another (serially) until the all of the IP addresses have been exhausted and either a connection is made, or a connection timeout threshold has been reached. This can be problematic, because depending upon DNS configurations, the “correct” or “online” IP address may not be the first IP address returned. The default timeout for a TCP connection attempt is 21 seconds and if the first IP address attempted is not online, it will wait 21 seconds before attempting the next IP address. For each subsequent IP address, it will again have to wait 21 seconds before moving to the next IP address until the connection attempt times out or it establishes a connection to an IP address that responds.

For more information on the symptoms associated with the SQL Client’s legacy default behavior (MultiSubnetFailover=FALSE) see our other AlwaysOnPro article:

Connection Timeouts in Multi-subnet Availability Group

Improved SQL Client Provider Defaults to MultiSubnetFailover=TRUE

That changes with the updated SQL Client provider shipping in .NET 4.6.1. Now SQL Client Provider’s default behavior is to retrieve all IP addresses up front and attempt to connect to them all in parallel. This should result in the successful connection to the online IP addresss and is the optimal way to reconnect to an availability group in the event of failover.

Multi-Subnet Architecture with MultiSubnetFailover

The following article describes the improvement in more detail and a download link to the updated .NET package:

.NET Framework 4.6.1 is now available!

Improve MultisubnetFailover connection behavior for AlwaysOn

The SqlClient now automatically provides faster connection to AlwaysOn Availability Group that was introduced in SQL Server 2012. It transparently detects whether your application is connecting to an AlwaysOn availability group (AG) on different subnet and quickly discovers the current active server and provides connection to the server.

Prior to this release, an application has to set connection string to include “MultisubnetFailover=true” to indicate that it is connecting to AlwaysOn Availability Group. Without turning on the connection keyword, an application might experience a timeout while connecting to AlwaysOn Availability Group.

With this feature an application does not need to set MultisubnetFailover to true anymore. For more information about SqlClient support for AlwaysOn Availability Groups, see SqlClient Support for High Availability, Disaster Recovery.

For more information on SQL Client Provider support for High Availability and specific support for the MultiSubnetFailover connection string parameter see:

SqlClient Support for High Availability, Disaster Recovery

↧