Attack on Bigrock DNS servers
Posted by on 09 June 2015 02:13 PM
Update:: 15-06-2015 4:00 PM IST
The attack on the DNS Server IP addresses started on 9th June, 2015, at 06:43:20AM GMT. Services started to come back around 06/09/2015 03:02:28PM GMT, when a majority of our IP addresses were released from the NULL route. We still had a few IP addresses that were under mitigation and we were able to take all the IP addresses out of mitigation at 06/09/2015 09:51:20PM GMT.
The traffic patterns were completely unpredictable, inconsistent, evidently malicious, or at least obviously invalid. The initial attack went over 40gbps and the Data Center had to null route all the IP addresses as the mitigation systems were overwhelmed.
When an IP Address is null routed, the Data Center loses the visibility on the traffic as the IP address is blocked on its upstream routers. So, to identify the attack vector we started removing each IP address from the null route and started Arbor TMS mitigation on them to figure the attack pattern, domains involved, malicious traffic sources, etc. Over the next 4-5 hours, we had several rounds of conversation with the Data Center Network Operations Center (NOC) team to mitigate the attack. This process was time consuming and took us at least 30 minutes per IP address.
Current Managed DNS Infrastructure
Over the last few days we have spent time analyzing our DNS server architecture, DDoS mitigation process and capacity at our Data Center. The following covers the details of our current managed DNS Infrastructure:
Managed DNS Architecture
Our DNS servers are spread out across 4 Data Centers in the US. They are isolated both physically and at the network level with their own bandwidth capacity, network gear, etc. They are all hosted with Softlayer, who have always provided us with the best service under all circumstances.
Each domain registered with us gets Managed DNS service for free with 4 Name Servers configured. To illustrate, here’s an example:
DDoS Mitigation Capacity
Softlayer’s network has been battle-tested many times before during similar DDoS attacks. Each Softlayer Data Center is equipped with multiple 10Gbps or 40 gbps transit links to the internet and uses high-end networking gear. Softlayer also use Arbor Peakflow for DDoS detection and Arbor TMS for DDoS mitigation. Each of the Arbor TMS systems is capable of mitigating 10+ gbps of attack traffic.
You can read more about Softlayer’s network and architecture at http://www.softlayer.com/network.
What went wrong?
Typically we see one or few DNS Server IP addresses getting attacked and they get either null routed or mitigated on the TMS system. This activity is pretty common and we see two or three such incidents every week. We have always maintained our service levels during all such incidents.
During the recent attack, we received 40+ gbps traffic spread out across all our DNS server IP addresses. The attack traffic was moving from one IP Address to the other at rapid succession. Softlayer, to prevent instability on its network, null routed our IP Addresses. The null route is a rule to drop all traffic destined to our IP address at the Softlayer’s upstream ISP’s network. What this means is that after the null route is in place even Softlayer will not have the visibility into what the attack traffic is.
Post this, we started removing each null route, finding and mitigating the attack on every IP address.
What’s wrong with our setup ?
Problem 1: Relying solely on Softlayer Datacenters and DDoS mitigation capabilities.
Problem 2: We are bound to /32 static IP addresses provided by the Data Centers. We are not utilizing our own /24 subnets to host the DNS servers. By using our own /24 subnets, we could have swung the traffic to our third party DDoS mitigation partner - Neustar.
Problem 3: All Customer Name Servers were pointing to the same IP addresses. So, when the attack happened and caused disruption, all customers were affected.
To solve these problems, we have already planned a new DNS architecture and have made significant progress in deploying the same.
Our new Managed DNS Architecture
The new Managed DNS infrastructure architecture (initial) is as below.
In phase 1, we will move current DNS server IP addresses to our own IP Subnets. This ensures that we have the ability to use Neustar for DDoS mitigation when needed. All our Data Centers are already protected by Neustar. Learn more about Neustar DDoS mitigation service at https://www.neustar.biz/services/ddos-protection .
In phase 2, we will start bucketing customers across different IP addresses such that an attack on a domain in set 1 will not disrupt DNS service to customers in other sets.
In phase 3, we will start introducing DNS servers in other geographical regions where we have a Data Center presence and use Anycast. With Anycast, an attack originating from a particular region will only affect that region and other regions will continue to work normally. The affected region will use Neustar DDoS mitigation to mitigate the attack.
Our immediate goal is to complete phase 1 in the next 2 weeks. This will ensure that we can withstand any attack and also eliminate the Single Point of Failure (SPOF) with Softlayer. It will also enable us to use Neustar’s Anycast DDoS mitigation system to withstand any traffic volumes.
We will communicate the changes required by you as and when needed to ensure that the new setup is utilized.
Once again, we sincerely apologize to you. We understand that you count on us for impeccable service. The BigRock team is extremely passionate in providing the best experience and building phenomenal products for you. We continue to dedicate our services to helping you build and grow your business to its full potential.
All our IPs have been removed from Null Route. The DNS services should be completely functional now.
We will update this thread once we have a complete root cause analysis.
Thank you for your patience and cooperation.
Please feel free to contact us, should you have any further doubts or issues.
We are currently facing an attack on the Bigrock DNS servers, on account of which the DNS records (A, MX, PTR, etc...) could not be fetched.
Below are the IP addresses which have been null routed due to the attack:
Name Server Details:
We are working in co-ordination with the data center to mitigate this attack as soon as possible. Please follow this thread for more information.
Feel free to contact the support team, in case of any further queries or concerns.