Problem
Recently we had a project to configure SQL Server AlwaysOn Availability Groups between three nodes which were hosted on a multi-subnet environment. A multi-subnet environment requires the Windows cluster to be used as the backbone for AlwaysOn and each of the server nodes are located in multiple/different subnets. If you want to learn how to configure SQL Server AlwaysOn between a multi-subnet cluster then I would recommend you read this tip.
In this tip, I will explain how to fix an issue which I faced when configuring SQL Server AlwaysOn. This tip will help you fix this issue and help you reconfigure SQL Server AlwaysOn in a multi-subnet network.
Solution
As per Books Online "The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases that fail over together. An availability group supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations."
Configuration Scenario and Setup
We have three machines named PRI-DB1 ( IP: 10.X.3.XXX ), PRI-DB2 ( IP: 10.X.4.XXX ) and SEC-DB2 ( IP: 172.X.15.XXX ). The IP addresses of the machines reflects their subnet indicating they belong to a different series from each other. PRI-DB1 and PRI-DB2 are hosted in a corporate data center whereas SEC-DB2 is hosted by the Amazon cloud platform. All three machines are running Windows Server 2012 R2 Enterprise Edition and SQL Server 2014 Enterprise Edition. PRI-DB1 will be the primary replica and the remaining two nodes will be the secondary replicas. Data replication between PRI-DB1 and PRI-DB2 will use synchronous-commit mode and the failover mode will be Automatic with no data loss which can be used for HA in case the primary replica goes down. Data replication between PRI-DB1 and SEC-DB2 uses asynchronous mode and the failover mode is Manual which can cause some data loss in case of a disaster.
Before going ahead, I would like to let you know that I have followed the step by step process to configure SQL Server AlwaysOn between multi-subnet cluster, but ended with an error 35250 at the end of the SQL Server AlwaysOn configuration window. I am unable to configure SQL Server AlwaysOn because of the error Failed to join the database 'DRTest' to the availability group 'DBAG' on the availability replica 'PRI-DB2' (Error: 35250). Although you can see the details like secondary replicas, databases, etc. created under the AlwaysOn High Availability folder despite throwing this error, but the databases will not be joined to the Availability Group which we will do later in this tip to fix the issue.
I have not covered SQL Server and Windows Server Failover Cluster installation and the initial SQL Server AlwaysOn configuration steps in this tip. You can use my last tips to configure an AlwaysOn Availability Group. I am assuming that these steps will be taken care of by you prior to fixing this issue with the help of this tip.
NOTE : MAKE SURE TO IMPLEMENT THIS SOLUTION IN A LOWER LIFE CYCLE FIRST. DO NOT MAKE ANY CHANGES IN PRODUCTION WITHOUT PROPER TESTING IN LOWER-LIFE CYCLE ENVIRNOMENTS.
Troubleshooting Error 35250
Step 1
As I said above, I will not show you the steps to configure SQL Server AlwaysOn, because I assume that you have already gone through the steps mentioned in my last tip. Here I will go directly to the last window of the SQL Server AlwaysOn configuration where we get the errors (if any) post execution. You can click on the "Error" link to get the details of the error as shown in the below screenshot.
Step 2
I did some research on social media and on the web to rectify this issue. I found this MSDN link where many solutions were suggested. I decided to check the endpoint status on all replicas first by running the below command.
--Run below command to check endpoint state select r.replica_server_name, r.endpoint_url, rs.connected_state_desc, rs.last_connect_error_description, rs.last_connect_error_number, rs.last_connect_error_timestamp from sys.dm_hadr_availability_replica_states rs join sys.availability_replicas r on rs.replica_id=r.replica_id where rs.is_local=1
First I executed the above command on the primary replica and the output is noted as "Connected".
Next I executed the same command on both secondary replicas where I learned that the endpoint state is "Disconnected" with an error which is shown below.
Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.
The endpoints are not connected on the secondary replicas which caused the connection failure. The target primary and secondary replicas are unable to communicate because of a firewall rule like inbound traffic for port number 5022 on which the endpoint needs to be checked and unblocked because by default inbound traffic is blocked by the Windows firewall.
Step 3
By default, AlwaysOn configures endpoints to use port 5022 which we kept in our configuration. You can see the endpoint port number 5022 is used in the above image. You can query sys.tcp_endpoints on each replica to confirm which port is being used in your environment. As per MSDN "Port 5022 is used by the primary and secondary replicas for synchronization and communication purposes. Traffic must be allowed inbound on this port. Testing has shown that if port 5022 inbound traffic is blocked at the primary, at the secondary or at both, you will be unable to create the availability group and message 35250 will be reported."
So here we will create a new Inbound Port Rule for port 5022 on the Primary and both Secondary servers. Launch "Windows Firewall with Advanced Security" console to create a new Inbound Port Rule. You can type "Windows Firewall with Advanced Security" in the Windows search option to get this console as shown in the below image.
Click on the "Windows Firewall with Advanced Security" as shown above.
The "Windows Firewall with Advanced Security" console will appear. You can see the "Windows Firewall with Advanced Security" console in the below screenshot.
Step 4
Since we have to create an inbound traffic rule, right click on "Inbound Rules" from the left pane and click on "New Rule..." as shown in the below image.
Step 5
A window named "New Inbound Rule Wizard" will appear on your screen. You can see this window in the below screenshot.
We are creating this rule for port number 5022, so the next step is to select the "Port" option and then click on the "Next" button to proceed.
Step 6
On the next window choose the appropriate protocol and port number for which we are creating this rule. Choose TCP as the protocol and select the "Specific local ports" option and enter port number 5022 as shown in the below image. Click on the Next button to proceed.
Step 7
The next window named "Action" will be used to configure what action will be taken when using this rule. Choose appropriately as per your requirement. Here I have chosen the first option: "Allow the connection" as shown in the below image.
You can see there are three options on this page. I have chosen the first option which is the default. The second option is for filtering incoming connections based on the suggested parameters and third option is to block the connection. Now click on "Next" to proceed.
Step 8
The next option is to configure the profile for this rule. Here you will see three options. I will move forward with the default option which suggests all three options, but you can choose according to your requirements.
Step 9
The next and final window is for naming the rule and providing a description. Enter the name and description of this rule as shown below.
Now click on the "Finish" button to apply this change. Once you click on the "Finish" button, this window will disappear and the new rule can be seen in the right side pane of the Inbound Rules.
Repeat the same steps on all replicas. I repeated these steps on both replicas PRI-DB2 and SEC-DB2.
Re-joining Availability Databases to the Availability Group
As shown at the start of this tip, once the SQL Server AlwaysOn setup is finished, it will throw error 35250. However, the setup will create all required details like secondary replicas, databases, etc. under the AlwaysOn High Availability folder. The only thing which will not happen is the communication/data-transfer between these replicas because availability databases will not be joined to the Availability Group. This has now been fixed with the steps from the above section. Once the databases are successfully added to the Availability Group then AlwaysOn will run successfully.
Step 1
Now connect to the secondary replicas with SQL Server Management Studio where you are facing the issue. Expand the "AlwaysOn High Availability" folder, then "Availability Groups" then expand your Availability Group name "DBAG" and finally Expand "Availability Databases". You can see both databases are there, but with a warning icon. The status of both databases should be green if there is no issue. Let's start with the "Test" database. Right click on database "Test" and choose "Join to Availability Group..." as shown below.
Step 2
Once you click on the "Join to Availability Group..." option, a wizard "Join Database to Availability Group 'DBAG'" will appear as shown below.
We can see the name of the database which we are joining to the Availability Group DBAG. Verify the details on the window and click on the "OK" button to proceed with this change. Once you click on the "OK" button it will start executing for a few seconds and then disappear from the screen. You can see the warning sign on database "Test" has now changed in SSMS and now shows a green status which means the database has joined the Availability Group. If an error is generated then the above screen will not disappear, but will remain showing the error details in this window. You can see the processing in the below image.
Step 3
Do the same exercise for the other databases which are facing the same issue on the secondary replicas. We have one more database named "DRTest", so we did the same for this database. Both databases have been successfully joined to the Availability Group DBAG on this secondary replica.
Step 4
If you have multiple secondary replicas and you are facing this issue on them as well you should repeat the same exercise.
Connect to the secondary replica. Expand the "AlwaysOn High Availability" folder, then "Availability Groups" then expand your Availability Group name "DBAG" and finally Expand "Availability Databases". You can see your availability databases that are facing an issue with the warning icon. The status of both databases should be green if there is no issue. Make sure to join all impacted databases to your respective Availability Group. I did this on both secondary replicas for both databases and finally once done, your availability database will look like the below screenshot.
Validation
Now we will validate this configuration to make sure SQL Server AlwaysOn has been configured and is working correctly. We will run a dashboard report to see the AlwaysOn status and then we will do failover test to make sure the failover is working. Connect to the primary replica, expand your "AlwaysOn High Availability" folder to your Availability Group name. Here the AG name is DBAG. Right click on the AG name DBAG and click on the "Show Dashboard" option to display the dashboard report in the right pane.
We can see that PRI-DB1 is the primary replica and PRI-DB2 is its secondary replica. Automatic failover is possible between PRI-DB1 and PRI-DB2, because of its configuration of automatic failover mode, so we will run the below command to failover to PRI-DB2. Connect to the server instance that hosts the target secondary replica, its PRI-DB2 in our case, and run the below command to failover.
--Run on target secondary replica. Here our AG group name is DBAG so we used DBAG. ALTER AVAILABILITY GROUP DBAG FAILOVER
Now check the dashboard report by right-clicking on the Availability Group named DBAG and choose the show dashboard after running the above command.
We can now see the primary replica is PRI-DB2. PRI-DB1 and SEC-DB2 are acting as secondary replicas meaning that the SQL Server AlwaysOn configuration is working correctly and data is also replicating to the secondary replicas.
Now we can check the endpoint state on secondary replica the way we did in Step 2 in the first section where the endpoint was showing "DISCONNECTED" on the secondary replica. Now run the SQL code which we ran in Step 2 on the secondary replica and see the change.
Now it is showing as CONNECTED, whereas it was disconnected before creating the inbound rule.