Archive

Posts Tagged ‘Contrail’

Dynamic Network Architectures

August 25, 2014 Leave a comment

What are Dynamic Network Architectures ?

A Dynamic Network is one that adapts itself to the requirements of the applications running on top of it. Take one specific application. When this application starts, it needs connectivity. This connectivity is typically provided through a connectivity context in the network (a VLAN, QoS parameters, … ). As this application moves from one server to another and from one switch port to another, the connectivity context on the physical switch port needs to move with it.

Application Virtualization

I deliberately started my talk from an application perspective and not from a virtual machine. The virtual machine is a facilitator and in the end it all comes down to applications and services. The granularity of virtualization is gradually moving up the application stack. Containerization technologies like Docker are growing in popularity and the recent acquisition of CloudVolumes by VMWare enforces this observation. In containerization technologies applications are being abstracted from the underlying OS and wrapped in application management containers which can be delivered to diverse environments in real-time. All this progress in virtualization is driven mostly by the business requirement for agility or the speed at which new applications and services need to be created.

Orchestration with Junos Space Network Director

Dynamic Network Architectures can be achieved using existing networking infrastructure. Depending on the scale and level of dynamics, different implementations are possible are we will cover some of them. Other approaches are possible, OpenFlow for example, but these will be covered in a separate note.

The easiest way of achieving a dynamic network is through the use of traditional VLANs. For this to work a central or decentralized controller needs to reconfigure ports whenever an application moves. Consider a virtualized server infrastructure operated through VMWare. vCenter is typically the central management software that provides visibility and management for all virtual machines running on all the physical servers. vCenter keeps track of every machine that is started, stopped or moved between physical servers. On the other end of the data center infrastructure there is the network management software like Junos Space. Junos Space Network Director manages and monitors all switches in the data center network. Bridging both worlds is possible through an open and documented REST API that is exposed from VMWare vCenter. Through this REST API, Junos Space Network Director can receive notifications whenever there is a change in the state of a virtual machine. LLDP is used to map the physical servers to their corresponding switch ports. By consolidating this information Junos Space Network Director is able to know exactly which physical server is connected to which physical port. The network administrator only needs to provide a mapping table from virtual networks to physical networks (VLANs) and with this information Space ND is able to provision a physical switch port with only the VLANs corresponding to the virtual machines active on a specific physical server. Whenever a virtual machine is moved to another target physical server and its virtual interface removed from the virtual switch in the originating physical server, the VLAN mapped to the virtual network will be removed (pruned) from the original switch port and added to the trunk on the switch port connected to the target physical server. As such, the VLANs on the switch ports follow the virtual machines as they move across the physical servers.

The reconfiguration of switch ports is provided by Junos Space Network Director and performed through traditional configuration change and commit. Needless to say that this type of orchestration only provides a certain degree of dynamic in the network. This solution is mainly for an environment that is mostly static in nature. A good fit for data centers where there is a limited number of logical network contexts (VLANs) and where most virtual servers are supposed to be up 24/7 and virtual machine motion is limited to occasional movements for maintenance or rebalancing of the load. Most enterprise private clouds fall into this category and can take advantage of this solution to orchestrate their virtual infrastructure without introducing more complexity or overhead than required.

dna_img1

One important gain for network operators in this architecture is the elimination of manual configuration of VLANs. Forgetting to provision a VLAN to a trunk when a new virtual server is spun up is one of the most common sources of errors in the data center network. Also consider what needs to be done on the network side when you want to move all virtual machines from one physical server temporarily to another server for server hardware maintenance or upgrades…

Suppose you have 100 virtual machines spread across 10 VLANs and you need to provision VLANs for the switch ports. One could provision trunks carrying all VLANs to all switch ports. However there are limitations imposed by the switch hardware (maximum number of vmembers) and impacts on the efficiency of the use of CPU cycles by the virtual switches in the servers because of the nature of layer 2 broadcast networks. Without going to much in detail, the best solution is to map VLANs to switch port trunks only for the virtual networks that are running on the server connected to that port.

A full overlay solution (see later) is overhead, but using Space Network Director we have a lean solution that can help the customer optimize his private cloud network without much effort and without impact.

The solution even provides orchestration to some extend as such that whenever a new machine is created by the server people, the network automatically creates the corresponding VLAN on the correct port. No more manual intervention needed by the network operations team. All the network administrator needs to provide upfront is the mapping between virtual and physical networks, Space Network Director takes care of the rest.

Space Network Director is not the only solution in the market to provide this kind of orchestration, Arista has a similar solution called ‘VM Tracer’ which runs on the switch control plane. It needs to run on every switch participating in the virtualization while Juniper solves it through a central server by incorporating the functionality in the management server.

Overlays and SDN Controllers

Consider a multitenant public cloud. Virtual machines are spun up, killed and moved frequently and the 4095 VLAN hard limit provides an upper bound for the number of tenants a datacenter can host. To break through the barrier of 4095 logical tenant networks a new technology is required that offers a larger address space, compare this to what IPv6 is to IPv4 but then in a layer 2 data center context. The new solution should also provide a high degree of flexibility to adapt to dynamic changes in the location of the virtual machine. A new set of solutions have emerged from this through the use of MAC in IP encapsulation. All L2 traffic between virtual machines and different physical servers is encapsulated in an outer IP packet. The encapsulation of the virtual machine traffic should happen as close as possible to the virtual machine by a device that known the state of the machines and hence is typically performed by the virtual switch. All VM traffic is merely tunneled across the network infrastructure between physical servers. The virtual switch provides the Virtual Tunnel End Points (VTEPs) that encapsulate and decapsulate the traffic from and to virtual machines running on its physical server.

Examples of encapsulation formats are STT (Nicira), NVGRE (Microsoft), MPLSoGRE (Contrail), MPLSoUDP (Contrail) and VXLAN (VMWare). The most prevailing encapsulation format today is VXLAN, not surprisingly since VMWare is the virtualization of choice for a lot of enterprises and providers. VXLAN provides a logical context addressing space up to 16 million VNIDs (Virtual Network Identifiers), large enough to accommodate the biggest multitenant public clouds. Encapsulation of layer 2 traffic is done in UDP/IP packets.

The dynamic part of the solution is provided through re-anchoring tunnel endpoints. Whenever a virtual machine moves from one physical server to another, the tunnel moves with it. It is like creating an overlay layer 2 dynamic network on top of a ‘static’ physical network though the use of MAC in IP encapsulation, hence the name ‘overlays’.

Since all traffic between virtual machines is encapsulated in outer IP packets and the endpoints of the tunnels (VTEPs) are inside the physical servers, the underlying network does not see the MAC and IP addresses of the virtual machines and only sees the IP addresses of the hypervisors (physical hosts). This makes overlay network agnostic to the underlay. Pretty much any underlying infrastructure that can provide IP connectivity between servers can be used as transport for overlay networks.

dna_img2

That said, for a well performing overlay it is of the utmost importance that the underlay is performing well, consistent in performance and resilient. If the underlay is not consistent in performance (different latencies depending on which path is taken), the placement of virtual machines and workloads is not arbitrary anymore. This is why fabric architectures like QFabric and VCF are very good candidates for underlay networks as they provide consistent latency and predictable performance between any two ports in the fabric, creating one big pool of network resources with consistent performance and providing one big virtualization resource pool to arbitrary place computing resources. If not consistent in performance, placement of resources within the infrastructure must be done with precision as to prevent high latency paths between closely related workloads (eg between application, middleware and database tier of a web application).

I deliberately avoided to mention earlier how a virtual machine that wants to talk to another virtual machine can find the physical host and hence the tunnel endpoint IP to talk to it. There are several ways of solving this problem. The VXLAN RFC standard specifies the use of multicast. All tunnel endpoints (VTEPs) part of the same VNID subscribe to the same multicast group and report any changes in virtual interfaces on its virtual switch by publishing the MAC address of the new virtual machines to the multicast group. All VTEPs listening to the multicast group will record the MAC address and the IP address of the VTEP hosting this MAC.

Another approach to solve this problem is the use of a central controller that tracks all activity in the virtual world and distributes the required information to all VTEPs in the network. This is effectively the task of the SDN Controller in the data center. Examples of such controllers are NSX from VMWare, Contrail from Juniper and the OpenDayLight open source project.

The Universal SDN Gateway

At this point it should be clear that all virtual machines can talk to each other using an overlay. There aren’t much applications that are confined within an isolated network though, and at some point an application will need to break out of the overlay and talk to the physical world, eg the internet or what we call a Bare Metal Server (BMS). A Bare Metal Server is a server that is not running any virtualization software, eg a SUN Solaris server running an Oracle database of a SRX providing security services. So there is a need to be able to talk to machines that do not have a tunnel endpoint (VTEP). The VTEP function could for example be placed in the switch, at the port connecting the bare metal server. Compare this to the placement of the VTEP in the vswitch where the virtual interface of the virtual machine is attached. Pretty much the same architecture but this time the VTEP functionality is provided by the hardware switch. The VTEP in this switch will have to be able to play in the MAC learning process of the overlay network. In the case of VXLAN per RFC standard this would be multicast. If a VTEP wants to work in an overlay context where VMWare NSX is the controller, the switch must be able to talk with the controller and support for the protocol must be implemented in the switch. In the case of VMWare NSX for multi-hypervisors, this is the OVSDB protocol (OpenVSwitch Database protocol).

The Broadcom Trident II chip provides VXLAN encapsulation support in hardware. From the above it should be apparent that encapsulation is not the only thing required for a VTEP, the MAC learning (control plane) must also be provided either as multicast or as a protocol implementation for a specific solution like VMWare NSX.

Juniper’s QFX5100, which is based on the Trident II chip, provides support for standard multicast VTEP and VMWare NSX for Multi-Hypervisors and as such can be used as a Top of Rack Layer2 gateway, connecting BMS into the overlay with the virtual machines.

dna_img3

Breaking out of the data center to the internet or to another data center can be as easy as a layer 2 gateway function or as complex as stitching the VXLAN traffic directly to a VPLS. The latter use case is not implemented in any merchant silicon solution today. The EX9200 and MX systems, which are based on Juniper custom ASICs, will provide this functionality soon, making them the only platforms that can proudly call themselves Universal SDN Gateways. The EX9200 and MX will provide L2, L3 and VPLS VTEP functionality in hardware allowing them to be the data center edge and stitching overlays from one data center to the other across a VPLS or provide the gateway between different overlays in different PODs within the same data center (eg connecting a VMWare VXLAN POD to a Contrail MPLSoUDP POD).

dna_img4

Orchestration tools

Now that we have the mechanisms in place to create a dynamic network architecture, we need the tools to provide the end-user or the server infrastructure manager a way to create new virtual services and virtual networks. This is where orchestration tools like OpenStack, CloudStack, IBM SmartCloud, VMWare vCloud Director, … come into play. OpenStack, to take an example, is composed of different modules.

It provides a web interface to allow the user to create new virtual machines and networks and to manage and monitor the virtualized infrastructure. This dashboard is called ‘Horizon’. OpenStack also has a module to connect to the compute part of the virtualized infrastructure which is called ‘Nova’. There a two modules for interfacing with storage, one for block storage called ‘Cinder’ and one for Object storage called ‘Swift’. Another module provides the interface to the network infrastructure and this one is called ‘Neutron’.

The OpenStack modules are a plugin containers and they can host different plugins depending on which infrastructure they need to manage. For example, for the Nova compute module there is a plugin for VMWare, for Microsoft Hyper-V, for KVM, … For neutron there is a Juniper plugin which can directly talk with EX and QFX switches using Netconf/DMI or which can talk to the Space ND-API. This allows OpenStack to manage any Junos based infrastructure directly, without the need for overlays and SDN controllers. This is the Private Cloud traditional VLAN based model mentioned at the beginning of this note.

dna_img6

For larger multitenant public clouds, OpenStack Neutron also has plugins to connect to Contrail or the NSX Controller, providing orchestration of the network through the use of overlays. The latter provides the most agile, dynamic and scalable cloud infrastructure for virtualized data centers.

dna_img7

IP Fabrics

Because of the nature of overlay networks, the underlying physical network only needs to provide L3 IP connectivity between the physical servers (hypervisors). There is no need for multiple or stretched VLANs, only IP connectivity. This allows a different network topology typically used in massively scalable data centers which is called the IP Fabric. An IP Fabric uses L3 dynamic routing protocols to connect individual switches together through routing, typically organized in a spine and leaf topology. Load balancing is provided through use of ECMP. All switches are managed individually and because there are no stretched broadcast domains this architecture is highly scalable. This will be a topic for a future discussion, but I wanted to mention it in this context because the combination of IP Fabrics and overlays provide the design blocks for a massively scalable multitenant public cloud architectures.

Advertisements
%d bloggers like this: