Mission critical cloud applications require sharp elbows.
There are three types of applications; applications that manage the business, applications that run the business and miscellaneous apps.
A security breach or performance related issue for an application that runs the business would undoubtedly impact the top-line revenue. For example, an issue in a hotel booking system would directly affect the top-line revenue as opposed to an outage in Office 365.
It is a general assumption that cloud deployments would suffer from business-impacting performance issues due to the network. The objective is to have applications within 25ms (one-way) of the users who use them. However, too many network architectures backhaul the traffic to traverse from a private to the public internetwork.
The thoughts of transitioning mission-critical applications to the cloud require a rethink of the existing cloud interconnects. This persuaded me to jump on a call to Sorell Slaymaker to get his valued opinion on the future of cloud interconnects and the new emerging model known as the Internetworking design.
Typically, as the market matures, we would be witnessing the transition to the Internetworking design that rides on top of traditional cloud interconnects driven by multi-hybrid cloud architectures.
Introducing the original interconnects
There are multiple traditional ways to connect to the cloud. Each way has its pros and cons in terms of speed, cloud ecosystem, price, security, and performance.
The first and also the most common way to connect is via a secure network, running over an Internet connection, such as an IPsec tunnel.
The second way to connect to the cloud is via a cloud interconnect. The enterprise acquires a private, direct, high-speed connection to a cloud interconnect, such as, Equinix Cloud Exchange and buys Ethernet cross-connects to various cloud service providers (CSPs).
The third way is with a direct wide area network (WAN). The enterprise can use its existing WAN MPLS/VPLS supplier to simply add CSPs as needed.
In reality, many enterprises would sit in the middle and end up using a combination of connectivity models. It all depends on where the users are located, and the type of applications opted by them. For example, a big data application, pulling data from lots of disparate sources would be an ideal fit for the cloud interconnect model.
On the other hand, if users are in the office, one would opt for direct WAN in comparison to remote workers where Internet transport would be used.
A complex architecture
The traditional cloud data center interconnect design consists of excess baggage. This results in a complex architecture of point-to-point connections of IPsec tunnels or Ethernet connections.
Mesh architectures do not scale very well and usually surface the N-squared problem. Every time you add a data center, you have to add the square of the number of additional connections to every other data/cloud center.
As a result, some type of overlay is used to manage the complexity. These overlays come in the form of an IPsec tunnel with some type of overhead for segmentation. Most of the time, Virtual Extensible LAN (VXLAN) would be used.
The architecture consists of many single-function services such as routers, firewalls, load balancers, WAN optimizers, and IDS/IPS. Single-function services create appliance–sprawl, which adds to the complexity and costs. The idle backup equipment might not only result in complex configurations but also in additional costs.
There are certain challenges resulting from bolt-on security. IPsec encrypts everything within a tunnel with the same encryption key. To put it in plain words, if you have different segments of varying security levels, each of these logical segments will share the same encryption key.
It’s an all or nothing encryption as you are encrypting every segment in the same way. Since the routes and peers are not authenticated, therefore, connections can be added to the mesh network without prior authentication.
Hence, you need to add firewalls. Essentially, we are bolting on security that is not integrated into routing and your environment. There are no ‘follow-the-network security rules’, where security rules could change dynamically with the network.
No application performance guarantees
The existing model offers no application performance guarantees. Path selection is based on the routed underlay that is not performance-based. The IPsec tunnel is going to take you from point A to B even though the chosen path may be heavily utilized or longer from the distance perspective than routing through a better alternative.
Besides, you also need to have separate tools for measuring application performance such as NetFlow.
The existing system lacks agility that consists of customized configurations for all the point-to-point links; a configuration that is usually not automated. A manually driven architecture will always be error-prone.
If you are going to use private links, for example, multiprotocol label switching (MPLS), there will be long deployment times, especially if you want to include redundant links. Keep in mind that most of these are still operating on the command line (CLI).
Evidently, the costs involved are quite high. If you have one MPLS link, you will have to use another MPLS link for redundancy. If left to its defaults, you can’t, for example, use the Internet as the backup. Keep in mind that private links typically cost 10 times more than the Internet links.
To bridge the gap, we are still buying specialized hardware and software. To provision, you either have to buy or rent expensive equipment. You are not, for example, running routing on an agile Amazon EC2 software instance.
The internetworking design
The goal of an internetworking design is to take the data center, regardless of whether it’s private or public and make it into one logical data center. Even though you have a number of physical locations, they act and look like one logical data center from the network perspective.
Another key aspect to Internetworking is routing, which is all the way to the end where the computing is done. One reason why this is important is if you have a multi-cloud strategy where you want to take a VMware solution and integrate it into AWS and Azure.
The internetworking design offers a simple architecture that can scale up to thousands of sites. Internetworking provides an end-to-end routing environment, as opposed to point-to-point. The protocols used to create this logical mesh vary, depending on the vendor.
Different vendors have different objectives. Some are more focused on Zero-Trust Security for internetworking, while others are using internetworking to resolve application performance related issues. Vendors that are not session-aware will need to use an overlay.
Single stack security
Ideally, the single software stack can be used for all the network functions. We are now beginning to see the convergence and blending of routing and security. Today’s network and security teams along with the products are different. Present-day, we are at a point where we want to have routing and security combined together.
Some SD-WAN vendors have collaborated with security vendors so that routing and security play well together. An example of this would be to use network function virtualization (NFV).
Here we are taking software stacks and running them on the same hardware instance and service chaining them together. There have been instances when others operate with a thin edge and simply push everything to the cloud. All security and self-healing operate in the cloud, thereby abstracting the complexity. Whichever method serves you best, in the future both security and networking will come closer.
Support for terabit networks
If you want Terabit speeds across the Internet, either buy a single box that scales vertically or take a number of boxes and scale horizontally. The internetworking architecture offers high performance and support to Terabit networks.
IP address Independent
The support for IP address independence and overlapping IPs is crucial. Many organizations have assigned teams that operate without restriction, resulting in 1000’s of AWS accounts. Eventually, there is a high potential of IP clash when the organization wants to move to shared services, logging or IAM.
Here you have two options; you can either readdress everything or go with a vendor product that abstracts the IP addresses. Abstracting the IP address carries out the routing based on other variables such as named data networking.
It also offers Zero-Trust security. The fundamental definition of Zero-Trust security is that no transmission control protocol (TCP) or user datagram protocol (UDP) is set up without prior authentication and authorization.
Adaptive Encryption and per session authentication
Adaptive encryption is encrypting at the application layer, using transport layer security (TLS) with the option to re-encrypt at the network level. Well, this has its own pros and cons.
The advantage is that double encryption is more secure than single encryption. If someone has TLS at the application layer, they can still get some metadata regarding the TLS connection during the TLS session setup.
Whereas, encrypting at the network layer uses a different key, enabling you to hide the metadata regarding that TLS session. However, if you are encrypting at the network layer, you are going to hit the network performance. To combat this, you may need to have additional resources to facilitate the encryption.
The Internetworking design offers 1:1 segmentation. This is a mapping of an application or service to another application or service. To be precise, within a virtual server you are saying that this application can communicate on this port only to this application on this port as opposed to a general server to server mapping.
Application performance and service assurance
Application performance and service assurance ascertain that the applications are performing efficiently. Besides, it ensures that the applications are taking the best path as opposed to the shortest, which may have high utilization or congestion.
Deterministic and dynamic routing is necessary when going through security stacks as you don’t want asymmetrical routing.
Some SD-WAN vendors are monitoring links by sending pings, bidirectional forwarding detection (BFD) or by some other proprietary keep-alive access link measurement. Border gateway protocol (BGP) routing controls the routing but it’s static. It can be configured based on rules or AS hops, however, none of those metrics are based on utilization of the link.
If more than 10% of the packet is dropped within a period of time, you should be able to set the route to a better path. A lot of SD-WANs are bragging about this because in the past when you set up a Cisco Intelligent WAN (IWAN), the setup of a flow would use the same link until that flow ended, regardless of jitter or packet loss.
Link load balancing for large sessions
Internetworking provides link load balancing for large sessions. Let’s say, you are executing a backup. Instead of using one single link, you can balance across multiple links to a given Amazon or Azure instance used concurrently to facilitate a large file transfer.
From the TCP perspective, it still looks the same. TCP still keeps the sequence numbers in order even though they come in on different paths. This is because the load balancing is performed at the network layer of the open system interconnection (OSI) model.
Maintaining session state
Within a network, sessions will go through a firewall. What happens when a topology change occurs resulting in a change of the return path?
Asymmetrical routing will cause the firewall to drop the session. As a result, we need to maintain session state through firewall boundaries and network topology changes. Thence, if one of your links is underperforming, you will need to make sure that the route changes are in both directions and not just one direction. This lets you fail over correctly. In this case, from the TCP perspective, you are still maintaining the TCP state.