Mice and Elephants in my Data Center

September 8, 2014 1 comment

Elephant flows

A long-lived flow transferring a large volume of data is referred to as an Elephant flow. Compared to the smaller sized, short lived flows referred to as ‘mice’ and of which we have numerous in the data center. Elephant flows, though not that numerous, can monopolize network links and will consume all allocated buffer for a certain port. This can cause temporal starvation of mice flows and disrupt the overall performance of the data center fabric. Juniper will soon introduce Dynamic Load Balancing (DLB) based on Adaptive Flowlet Splicing as a configuration option in its Virtual Chassis Fabric (VCF). DLB provides a much more effective load balancing compared to a traditional hash-based load distribution. Together with the existing end-to-end path-weight based load balancing mechanism, VCF has a strong load distribution capability that will help network architects drive their networks harder than ever before.

Multi-path forwarding

Multi-path forwarding refers to the balancing of packets across multiple active links between 2 network devices or 2 network stages. Consider a spine and leaf network with 4 spines, all traffic from a single leaf is spread across all the links in order to use as much as possible the available aggregate bandwidth and provide redundancy in case of link failure. 

Multi-path forwarding is typically based on a hash function. A hash function maps a virtually infinite collection of data into a finite or limited set of buckets or hashes, as illustrated below.

img1Image source: Wikipedia (http://en.wikipedia.org/wiki/Hash_function)

In networking terms: the hash function will take multiple fields of the Ethernet, IP and TCP/UDP header and use these to map all sessions onto a limited collection of 2 or more links. Because of the static nature of the fields used for the hashing function, a single flow will be mapped onto exactly 1 link and stay there for the lifetime of the flow. A good hashing function should be balanced, meaning that it should equally fill all hashing buckets and by doing so provide an equal distribution of the flows across the available links.

One of the reasons to use static fields for the hashing function is to avoid reordering of packets as they travel through a network when paths might not all be of equal distance or latency. Even if by design all paths are equal, different buffer fill patterns on different paths will cause differences in latency. Reordering can be provided by the end-point or in the network, but it always comes at a cost which is why a network ensuring in-order delivery is preferred.

Because of that static nature, the distribution of packets will be poor balanced when a few flows are disproportionally larger than the others. A long-lived, high volume flow will be mapped to a single link for the whole of its life-time and will cause the network buffer of that link to get exhausted with packet drops as a result.

TCP as in keeping the data flowing

To understand the mechanism of Adaptive Flowlet Splicing, we need to understand some of the dynamics of how data is transmitted through the network. TCP has been architected to try to avoid network congestion and keep a steady flow of data over a wide range of networks and links. One provision that enables this is the TCP window size. The TCP window size specifies how much data can be in-flight in the network before expecting a receiver acknowledgement. The TCP Window pretty much tells the sender to blindly send a number of packets and when an acknowledgement is received from the receiver the sender can slide the window down for the size of one packet for each received ‘ack’. The size of the window is not fixed but dynamic and self-tuning in nature. TCP uses what we call AIMD (Additive Increase, Multiplicative Decrease) congestion control. In AIMD congestion control, the window size is increased additively for each acknowledgement received and cut by half whenever a few acknowledgements were missed (indicating loss of packets). The resulting traffic pattern is the typical saw-tooth:

img2 Adaptive Flowlet Splicing (AFS)

From the above it should be apparent that elephant flows will result in repeating patterns of short bursts followed by a quiet period. This characteristic pattern divides the single long-lived flow over time into smaller pieces, which we refer to as ‘flowlets’. Below picture, courtesy of the blog article by Yafan An [*1], visually represents what flowlets look like within elephant flows when we look at them through a time microscope:

img3Now suppose that the quiet times between the flowlets are larger than the biggest difference in latency between different paths in the network fabric, in that case load balancing based on flowlets will always ensure in-order arrival.

To distribute the flowlets more evenly across the member links of a multi-path, it would be good to keep some kind of relative quality measure for each link depending on its history. This measure is implemented using the moving average of the link’s load and its queue depth. Using this metric, the least utilized and least congested link among the members will be selected for assigning new flowlets.

Is this elephant flow handling unique to VCF ?

By all means, no. But the controlled environment and the imposed topology of the VCF solution allows Juniper to get the timings right without having to resort to heuristics and complex analytics. In a VCF using only QFX5100 switches in a spine and leaf topology, each path is always 3 hops. The latency between two ports across the fabric is between 2µs and 5µs resulting in a latency skew of max 3µs. By consequence, any inter-arrival time between flowlets larger than the 3µs latency skew will allow flowlets to be reassigned member links without impacting the order of arrival of packets.

img4In an arbitrary network topology using a mix of switch technologies and vendors, every variable introduced makes it exponentially more complex to get the timings right and find the exact point where to split the elephant flow into adequate flowlets for distributing them across multiple links.

Another problem we did not have to address, in the case of VCF, is how to detect or differentiate an elephant from a mice. AFS records the timestamp of the last received packet of a flow. In combination with the known latency skew of 3µs, the timestamp is enough to provide the indicator for the reassignment of the flow to another member link. It is less important for AFS to be aware of the flow’s actual nature.

In arbitrary network architectures however, as Martin Casado and Justin Pettit describe in their blog post ‘Of Mice and Elephants’ [*2], it might be helpful to be able to differentiate the elephants from the mice and have them treated differently. Whether this should be done through distinct queues or different routes for mice and elephants or turning the elephants into mice, or some other clever mechanism is a topic of debate and network design. Another point to consider is where to differentiate between them? The vswitch is certainly a good candidate, but in the end the underlay will be the one that handles the flows according to their nature and hence a standardized signaling interface between overlay and underlay must be considered.


By introducing AFS in VCF, data center workloads that run on top of the data center fabric will be distributed more evenly and congestion on single paths that might be caused by elephant flows will be avoided. If a customer has no needs for a certain topology or for a massively scalable solution, a practical and effective solution like VCF brings a lot of value to their data centers.


[*1] Yafan An, Flowlet Splicing – VCF’s Fine-Grained Dynamic Load Balancing Without Packet Re-ordering – http://forums.juniper.net/t5/Data-Center-Technologists/Adaptive-Flowlet-Splicing-VCF-s-Fine-Grained-Dynamic-Load/ba-p/251674

[*2] Martin Casado and Justin Pettit – Of Mice and Elephants – http://networkheresy.com/2013/11/01/of-mice-and-elephants/


Dynamic Network Architectures

August 25, 2014 Leave a comment

What are Dynamic Network Architectures ?

A Dynamic Network is one that adapts itself to the requirements of the applications running on top of it. Take one specific application. When this application starts, it needs connectivity. This connectivity is typically provided through a connectivity context in the network (a VLAN, QoS parameters, … ). As this application moves from one server to another and from one switch port to another, the connectivity context on the physical switch port needs to move with it.

Application Virtualization

I deliberately started my talk from an application perspective and not from a virtual machine. The virtual machine is a facilitator and in the end it all comes down to applications and services. The granularity of virtualization is gradually moving up the application stack. Containerization technologies like Docker are growing in popularity and the recent acquisition of CloudVolumes by VMWare enforces this observation. In containerization technologies applications are being abstracted from the underlying OS and wrapped in application management containers which can be delivered to diverse environments in real-time. All this progress in virtualization is driven mostly by the business requirement for agility or the speed at which new applications and services need to be created.

Orchestration with Junos Space Network Director

Dynamic Network Architectures can be achieved using existing networking infrastructure. Depending on the scale and level of dynamics, different implementations are possible are we will cover some of them. Other approaches are possible, OpenFlow for example, but these will be covered in a separate note.

The easiest way of achieving a dynamic network is through the use of traditional VLANs. For this to work a central or decentralized controller needs to reconfigure ports whenever an application moves. Consider a virtualized server infrastructure operated through VMWare. vCenter is typically the central management software that provides visibility and management for all virtual machines running on all the physical servers. vCenter keeps track of every machine that is started, stopped or moved between physical servers. On the other end of the data center infrastructure there is the network management software like Junos Space. Junos Space Network Director manages and monitors all switches in the data center network. Bridging both worlds is possible through an open and documented REST API that is exposed from VMWare vCenter. Through this REST API, Junos Space Network Director can receive notifications whenever there is a change in the state of a virtual machine. LLDP is used to map the physical servers to their corresponding switch ports. By consolidating this information Junos Space Network Director is able to know exactly which physical server is connected to which physical port. The network administrator only needs to provide a mapping table from virtual networks to physical networks (VLANs) and with this information Space ND is able to provision a physical switch port with only the VLANs corresponding to the virtual machines active on a specific physical server. Whenever a virtual machine is moved to another target physical server and its virtual interface removed from the virtual switch in the originating physical server, the VLAN mapped to the virtual network will be removed (pruned) from the original switch port and added to the trunk on the switch port connected to the target physical server. As such, the VLANs on the switch ports follow the virtual machines as they move across the physical servers.

The reconfiguration of switch ports is provided by Junos Space Network Director and performed through traditional configuration change and commit. Needless to say that this type of orchestration only provides a certain degree of dynamic in the network. This solution is mainly for an environment that is mostly static in nature. A good fit for data centers where there is a limited number of logical network contexts (VLANs) and where most virtual servers are supposed to be up 24/7 and virtual machine motion is limited to occasional movements for maintenance or rebalancing of the load. Most enterprise private clouds fall into this category and can take advantage of this solution to orchestrate their virtual infrastructure without introducing more complexity or overhead than required.


One important gain for network operators in this architecture is the elimination of manual configuration of VLANs. Forgetting to provision a VLAN to a trunk when a new virtual server is spun up is one of the most common sources of errors in the data center network. Also consider what needs to be done on the network side when you want to move all virtual machines from one physical server temporarily to another server for server hardware maintenance or upgrades…

Suppose you have 100 virtual machines spread across 10 VLANs and you need to provision VLANs for the switch ports. One could provision trunks carrying all VLANs to all switch ports. However there are limitations imposed by the switch hardware (maximum number of vmembers) and impacts on the efficiency of the use of CPU cycles by the virtual switches in the servers because of the nature of layer 2 broadcast networks. Without going to much in detail, the best solution is to map VLANs to switch port trunks only for the virtual networks that are running on the server connected to that port.

A full overlay solution (see later) is overhead, but using Space Network Director we have a lean solution that can help the customer optimize his private cloud network without much effort and without impact.

The solution even provides orchestration to some extend as such that whenever a new machine is created by the server people, the network automatically creates the corresponding VLAN on the correct port. No more manual intervention needed by the network operations team. All the network administrator needs to provide upfront is the mapping between virtual and physical networks, Space Network Director takes care of the rest.

Space Network Director is not the only solution in the market to provide this kind of orchestration, Arista has a similar solution called ‘VM Tracer’ which runs on the switch control plane. It needs to run on every switch participating in the virtualization while Juniper solves it through a central server by incorporating the functionality in the management server.

Overlays and SDN Controllers

Consider a multitenant public cloud. Virtual machines are spun up, killed and moved frequently and the 4095 VLAN hard limit provides an upper bound for the number of tenants a datacenter can host. To break through the barrier of 4095 logical tenant networks a new technology is required that offers a larger address space, compare this to what IPv6 is to IPv4 but then in a layer 2 data center context. The new solution should also provide a high degree of flexibility to adapt to dynamic changes in the location of the virtual machine. A new set of solutions have emerged from this through the use of MAC in IP encapsulation. All L2 traffic between virtual machines and different physical servers is encapsulated in an outer IP packet. The encapsulation of the virtual machine traffic should happen as close as possible to the virtual machine by a device that known the state of the machines and hence is typically performed by the virtual switch. All VM traffic is merely tunneled across the network infrastructure between physical servers. The virtual switch provides the Virtual Tunnel End Points (VTEPs) that encapsulate and decapsulate the traffic from and to virtual machines running on its physical server.

Examples of encapsulation formats are STT (Nicira), NVGRE (Microsoft), MPLSoGRE (Contrail), MPLSoUDP (Contrail) and VXLAN (VMWare). The most prevailing encapsulation format today is VXLAN, not surprisingly since VMWare is the virtualization of choice for a lot of enterprises and providers. VXLAN provides a logical context addressing space up to 16 million VNIDs (Virtual Network Identifiers), large enough to accommodate the biggest multitenant public clouds. Encapsulation of layer 2 traffic is done in UDP/IP packets.

The dynamic part of the solution is provided through re-anchoring tunnel endpoints. Whenever a virtual machine moves from one physical server to another, the tunnel moves with it. It is like creating an overlay layer 2 dynamic network on top of a ‘static’ physical network though the use of MAC in IP encapsulation, hence the name ‘overlays’.

Since all traffic between virtual machines is encapsulated in outer IP packets and the endpoints of the tunnels (VTEPs) are inside the physical servers, the underlying network does not see the MAC and IP addresses of the virtual machines and only sees the IP addresses of the hypervisors (physical hosts). This makes overlay network agnostic to the underlay. Pretty much any underlying infrastructure that can provide IP connectivity between servers can be used as transport for overlay networks.


That said, for a well performing overlay it is of the utmost importance that the underlay is performing well, consistent in performance and resilient. If the underlay is not consistent in performance (different latencies depending on which path is taken), the placement of virtual machines and workloads is not arbitrary anymore. This is why fabric architectures like QFabric and VCF are very good candidates for underlay networks as they provide consistent latency and predictable performance between any two ports in the fabric, creating one big pool of network resources with consistent performance and providing one big virtualization resource pool to arbitrary place computing resources. If not consistent in performance, placement of resources within the infrastructure must be done with precision as to prevent high latency paths between closely related workloads (eg between application, middleware and database tier of a web application).

I deliberately avoided to mention earlier how a virtual machine that wants to talk to another virtual machine can find the physical host and hence the tunnel endpoint IP to talk to it. There are several ways of solving this problem. The VXLAN RFC standard specifies the use of multicast. All tunnel endpoints (VTEPs) part of the same VNID subscribe to the same multicast group and report any changes in virtual interfaces on its virtual switch by publishing the MAC address of the new virtual machines to the multicast group. All VTEPs listening to the multicast group will record the MAC address and the IP address of the VTEP hosting this MAC.

Another approach to solve this problem is the use of a central controller that tracks all activity in the virtual world and distributes the required information to all VTEPs in the network. This is effectively the task of the SDN Controller in the data center. Examples of such controllers are NSX from VMWare, Contrail from Juniper and the OpenDayLight open source project.

The Universal SDN Gateway

At this point it should be clear that all virtual machines can talk to each other using an overlay. There aren’t much applications that are confined within an isolated network though, and at some point an application will need to break out of the overlay and talk to the physical world, eg the internet or what we call a Bare Metal Server (BMS). A Bare Metal Server is a server that is not running any virtualization software, eg a SUN Solaris server running an Oracle database of a SRX providing security services. So there is a need to be able to talk to machines that do not have a tunnel endpoint (VTEP). The VTEP function could for example be placed in the switch, at the port connecting the bare metal server. Compare this to the placement of the VTEP in the vswitch where the virtual interface of the virtual machine is attached. Pretty much the same architecture but this time the VTEP functionality is provided by the hardware switch. The VTEP in this switch will have to be able to play in the MAC learning process of the overlay network. In the case of VXLAN per RFC standard this would be multicast. If a VTEP wants to work in an overlay context where VMWare NSX is the controller, the switch must be able to talk with the controller and support for the protocol must be implemented in the switch. In the case of VMWare NSX for multi-hypervisors, this is the OVSDB protocol (OpenVSwitch Database protocol).

The Broadcom Trident II chip provides VXLAN encapsulation support in hardware. From the above it should be apparent that encapsulation is not the only thing required for a VTEP, the MAC learning (control plane) must also be provided either as multicast or as a protocol implementation for a specific solution like VMWare NSX.

Juniper’s QFX5100, which is based on the Trident II chip, provides support for standard multicast VTEP and VMWare NSX for Multi-Hypervisors and as such can be used as a Top of Rack Layer2 gateway, connecting BMS into the overlay with the virtual machines.


Breaking out of the data center to the internet or to another data center can be as easy as a layer 2 gateway function or as complex as stitching the VXLAN traffic directly to a VPLS. The latter use case is not implemented in any merchant silicon solution today. The EX9200 and MX systems, which are based on Juniper custom ASICs, will provide this functionality soon, making them the only platforms that can proudly call themselves Universal SDN Gateways. The EX9200 and MX will provide L2, L3 and VPLS VTEP functionality in hardware allowing them to be the data center edge and stitching overlays from one data center to the other across a VPLS or provide the gateway between different overlays in different PODs within the same data center (eg connecting a VMWare VXLAN POD to a Contrail MPLSoUDP POD).


Orchestration tools

Now that we have the mechanisms in place to create a dynamic network architecture, we need the tools to provide the end-user or the server infrastructure manager a way to create new virtual services and virtual networks. This is where orchestration tools like OpenStack, CloudStack, IBM SmartCloud, VMWare vCloud Director, … come into play. OpenStack, to take an example, is composed of different modules.

It provides a web interface to allow the user to create new virtual machines and networks and to manage and monitor the virtualized infrastructure. This dashboard is called ‘Horizon’. OpenStack also has a module to connect to the compute part of the virtualized infrastructure which is called ‘Nova’. There a two modules for interfacing with storage, one for block storage called ‘Cinder’ and one for Object storage called ‘Swift’. Another module provides the interface to the network infrastructure and this one is called ‘Neutron’.

The OpenStack modules are a plugin containers and they can host different plugins depending on which infrastructure they need to manage. For example, for the Nova compute module there is a plugin for VMWare, for Microsoft Hyper-V, for KVM, … For neutron there is a Juniper plugin which can directly talk with EX and QFX switches using Netconf/DMI or which can talk to the Space ND-API. This allows OpenStack to manage any Junos based infrastructure directly, without the need for overlays and SDN controllers. This is the Private Cloud traditional VLAN based model mentioned at the beginning of this note.


For larger multitenant public clouds, OpenStack Neutron also has plugins to connect to Contrail or the NSX Controller, providing orchestration of the network through the use of overlays. The latter provides the most agile, dynamic and scalable cloud infrastructure for virtualized data centers.


IP Fabrics

Because of the nature of overlay networks, the underlying physical network only needs to provide L3 IP connectivity between the physical servers (hypervisors). There is no need for multiple or stretched VLANs, only IP connectivity. This allows a different network topology typically used in massively scalable data centers which is called the IP Fabric. An IP Fabric uses L3 dynamic routing protocols to connect individual switches together through routing, typically organized in a spine and leaf topology. Load balancing is provided through use of ECMP. All switches are managed individually and because there are no stretched broadcast domains this architecture is highly scalable. This will be a topic for a future discussion, but I wanted to mention it in this context because the combination of IP Fabrics and overlays provide the design blocks for a massively scalable multitenant public cloud architectures.

Network Automation

August 25, 2014 Leave a comment

What is automation ?

Deploying and configuring boxes is a tedious, boring process which is prone to error when performed manually. Imagine facing the task to change a small piece of the configuration in all devices part of a rather big deployment, whether a campus or a datacenter, connecting to each device and manually changing its configuration. Some changes might need specific values depending on the device’s location or type. This process fast becomes prone to error and could lead to important downtime. We are network engineers, great at building things, not at mimicking a manuscript. Seeking to make our life more interesting and eliminating repeatable tasks is our mission. From this, automation was born.  Automation can take many forms: low level custom scripting written in Python, Perl or Ruby, open automation tools, up to off the shelf commercial products like Junos Space or Infoblox NetMRI.


Programming languages are known to be structural and mature in the way of abstracting complexity and especially good at repeating the same tasks over and over again, each time adjusting the inputs and getting different but consistent outputs. Compilers have matured over the course of many years and the subject of a lot of research to provide deep and proactive syntactical and logic checking allowing developers to find potential issues early in the process.
By using those development tools into the automation of repeatable infrastructure management tasks lead to the word DevOps, short for Development and Operations. DevOps gives engineers the tool to describe the expected state of their infrastructure in a well-defined language, allowing the use of classes, inheritance, polymorphism and macros to create a structured definition of the state at any point in time. Software development also resulted in a range of tools for change management, process flow, auditing and accountability. In DevOps, pretty much the same tools can be leveraged without change – like using version control systems (e.g. Git, SVN, Mercury) to track changes or using them to create a process flow and let junior engineers submit changes which requires senior engineers to approve those before they are actually applied on the infrastructure.

Automation vs Orchestration ?

Automation and orchestration are often mentioned in the same sentence. Although both operate close to each other, there is a clear distinction between them. Automation is about automating repeatable operations tasks while orchestration is about managing the dynamic nature of an infrastructure. While the result of automation is a persistent change in the configuration, orchestration works with ephemeral (volatile) state. Orchestration will for example allow a dynamic network infrastructure to adapt to movements of virtual machines from one physical host to another (e.g. by changing the state in VTEPs for VXLAN) while automation on the other hand can help to assure that whenever a new VM is created, the corresponding VLANs are configured on the trunk connected to the physical host.
As often the case, there is a grey area. Take the Junos Space Network Director virtual/physical mapping feature. Space ND will not only automate the provisioning of VLANs on switch ports as described previously, but allows up to a certain point to “dynamically” move the VLAN definition between physical ports whenever virtual machines are moved between physical hosts. Space ND uses configuration changes and commits them to make this happen – this is a form of orchestration (the dynamics) but it could be labeled as automation as their results are persistent in nature.

Who needs automation ?

Anyone who needs to manage several devices individually will gain from automation. Which tools are better fit will depend on the number of devices, the knowledge of the team and what features commercial or open off-the-shelf tools can offer. Consider a smaller data center with 10 top of rack switches ; VCF would typically be a good fit while Junos Space Network Director provides configuration management, monitoring, analytics and for those situations where there is a need to customize or automate certain things the Space Platform’s templates and configlets can be leveraged.
On the other side of the spectrum, a mega scale data center (MSDC) with 1000s of ToRs deployed in a L3 IP spine and leaf topology where each device is managed individually and where the compute infrastructure teams have experience with open automation tools like Ansible, Puppet or Chef, the obvious choice would be to follow the infrastructure team’s choice.

Who drives the choice for automation ?

DevOps originates from the server infrastructure people. Originally it was used to keep usernames, passwords … (files basically) and software packages in sync across a large number of unix based machines (physical and/or virtual). The use of these tools for other infrastructure types, like networking, came later and by consequence it is natural that, typically, the choice for a certain automation tool is led by the server infrastructure people.

Which are the most popular automation tools ?

Ansible is an open-source software platform for configuring and managing computer configurations. It combines multi-node software deployments, ad hoc task execution, and configuration management. Ansible manages nodes over SSH and does not require additional remote software to be installed on the devices. It is written in Python and uses Playbooks written in YAML to express reusable descriptions of systems. Some examples of users include Spotify, Twitter, Atlassian (Atlassian which you might know as the people behind Bitbucket and Confluence)

Puppet is another open-source configuration management utility written in Ruby. The user describes system resources and their state in “Puppet Manifests”. The Puppet agent on each device discovers system information through a utility called “Facter” while the Puppet Master compiles the Puppet manifests into a system-specific catalog containing resources and resource dependency which can be matched against the facts and any required changes applied against target devices. Any actions taken by the Puppet agent on the device are then reported back to the Puppet Master. Examples of users of Puppet are PayPal, Yandex, and Rackspace.

Chef is yet another open configuration management tool written in Ruby and Erlang. It uses a pure Ruby, domain specific language for writing system configuration “Recipes”. These recipes describe a series of resources that should be in a particular state: e.g. packages that should be installed, services that should be running, or files that should be written. Chef makes sure each resource is properly configured and corrects any resources that are not in the desired state. Users of Chef are Yahoo!, Facebook, and Klarna.

Which Juniper devices support Puppet, Chef and Ansible ?

Puppet for Junos is available as an add-on package for a select number of devices and software releases. In its most current release, version 1.0 dated March 2014, the package is supported on:

  • EX4200, EX4500, EX4550 – Junos 12.3R2
  • MX5-80 – Junos 12.3R2
  • MX240-960 – Junos 12.3R2
  • QFX3500, QFX3600 – Junos 12.3X50-D20.1
  • QFX5100 – Junos 13.2X51-D15 with enhanced automation (later OS releases also supported)

More information and downloads are available from the Puppet for Junos landing page: http://www.juniper.net/techpubs/en_US/release-independent/junos-puppet/information-products/pathway-pages/index.html

Chef for Junos is currently only available on the QFX5100 running Junos 13.2X51-D15 with automation enhancements. More information available here: http://www.juniper.net/techpubs/en_US/release-independent/junos-chef/information-products/pathway-pages/index.html

Ansible for Junos 1.0 was recently released. Documentation is available from the website: http://www.juniper.net/techpubs/en_US/release-independent/junos-ansible/information-products/pathway-pages/index.html

It has been tested on the following devices running Junos OS:

  • EX4200 standalone and as members of a VC
  • QFX5100
  • MX80
  • MX240, MX480 standalone and as members of a VC
  • M120
  • SRX5000 standalone and as nodes in a chassis cluster

Remark that the modules do use Netconf and therefore should run similarly on all devices running Junos, there is no need to install any client software or have Python running on the devices.


The following modules are included in Ansible for Junos OS Release 1.0:

  • junos_get_facts Retrieve device-specific information from the host.
  • junos_install_config Modify the configuration of a device running Junos OS.
  • junos_install_os Install a Junos OS software package.
  • junos_shutdown Shut down or reboot a device running Junos OS.
  • junos_zeroize Remove all configuration information on the Routing Engines and reset all key values on a device.

QFX5100 Switch Automation Enhancements
To maintain the highest quality standards and assure the stability of the Junos software platform, only software packages and executables that are digitally signed by Juniper can be installed and run. Any automation tool which requires an agent to be deployed to the device will require some sort of base software through a signed package from Juniper engineering. For Puppet this would be the Ruby interpreter and the Puppet Agent. Once the Puppet Agent is available on the platform, Puppet will automatically download and install the required modules.
For the QFX5100 there is a special software bundle ‘jinstall-qfx-5-flex-x.tgz’ which is identical to the standard QFX5100 Junos software but with Veriexec disabled. This Junos version allows unsigned programs to be executed. This version keeps some safeguards to ensure that essential Junos OS files cannot be overwritten, but reserves a 1GB user partition to store unsigned binaries and software packages. Python, Chef and Puppet are included in the image. The switch automation enhancements change some default behavior like a L3 configuration as the factory default compared to an L2 with the standard image for Zero Touch Provisioning (ZTP).
More information about the QFX5100 Switch Automation Enhancements available here: http://www.juniper.net/documentation/en_US/junos13.2/topics/concept/junos-flex-overview.html

Junos scripting: SLAX and JUICE
The configuration and all operational command output in the Junos operating system is based on XML. Changes to this configuration or output can be performed using XML transformations. The CLI is an example of a client that applies transformations to the configuration XML. XML transformations are performed through XSLT (eXtensible Style Sheet Language Transformations) based on the instructions provided in a XSL (eXstensible Stylesheet Language) definition. Scripting on Junos is done by creating XSL files (scripts) that take as input the original configuration and which commits changes as a new XML configuration that is the result of the XSL applied to the original XML: New XML = XSL( Old XML ). XSL is a cumbersome language to write in and why Juniper engineers worked on an alternative language based on constructs from Perl and C. This alternative language is called SLAX or Stylesheet Language Alternative syntaX.SLAX allows you to write scripts that transform the configuration or that output information, along the way taking input from various sources like the command line, operational commands, external sources, local files. SLAX scripting makes extensive use of XML document parsing and formatting and the Xpath query language.

Junos provides 3 types of scripts:
• Commit scripts are executed during commit time and can provide additional processing and influence the outcome of the commit.
• Event scripts are triggered by system events like link down, SNMP trap, …
• Operational scripts that are executed at the operational prompt using ‘op <scriptname> args….’ command.

The Junos SLAX script interpreter library has been made available as an open source project and been used to create a SLAX development environment for unix based systems called JUICE. JUICE allows you to run unaltered on-box Junos SLAX scripts on an off-box server. JUICE can be used for offloading the Junos control plane from CPU intensive scripts, for developing, debugging and testing scripts before staging them to a Junos device or to use the power of SLAX to create new tools like ‘jsnap’. Jsnap is a community SLAX script for JUICE that provides device configuration and runtime snapshots and allows validation of configuration changes through inspecting the state before and after the change. It provides a language to write checks and detect problems in large configurations based on the output of operational commands.

More information about SLAX and JUICE:
Information and the scripts for jsnap can be found at:

The Network Configuration Protocol is a network management protocol developed and standardized by the IETF. Netconf provides mechanisms to install, manipulate, and delete the configuration of network devices. Its operations are realized on top of a simple remote procedure call (RPC) layer. The NETCONF protocol uses XML based data encoding for the configuration data as well as the protocol messages. The protocol messages are exchanged on top of a secure transport protocol like SSH or SSL.
Netconf is the language used by Junos Space to remotely manage and monitor Junos devices. Netconf can also be used from any programming language to create custom automation tools. Github.com/Juniper hosts several libraries for programming and scripting languages including Python, Perl, Golang, Ruby, Java and PHP. Using the libraries is optional, any programming language that can process XML and connect SSH or sockets can be used for automating configuration changes and monitoring Junos devices.

Junos Space Templates and Configlets
The Junos Space platform has some built-in tools that can help you to automate deployments and configurations up to a certain extend. Junos Space provides XML based configuration templates which can be instantiated and where placeholders (variables) can be populated during deployment. It allows one to keep pieces of Junos configuration in a central place and deploy them to multiple and different devices. A typical example would be to separate the configuration of DNS and NTP servers in a template which is common for all devices managed through Space. Whenever DNS or NTP changes occur, updating a single template and redeploying it to all devices is all that is required to implement the change across the infrastructure.
Configlets are configuration tools provided by Space that enable the user to quickly apply configurations to devices. A configlet is a configuration template written in the Velocity Template Language (VTL). The configlet will be transformed to a CLI configuration string by the Apache Velocity Template engine before being applied to a device. The dynamic elements (strings) in configuration templates are defined using variables. These variables act as an input to the process of transformation, to construct the CLI configuration string. These variables can contain anything: the interface name, device name, description text, or any similar dynamic values. The values of these variables are either defined by the user or the system, or determined by the context at the time of execution.

Junos Space REST API
The Junos Platform provides a REST API that allows automation of tasks that typically can be performed through the Space GUI. Using HTTP POST requests and XML or JSON encoded responses it is possible to create scripts in any language that interacts with Junos Space. It is possible for example to write a script that automatically provisions a device in Space after it is deployed using ZTP (meaning the stand-alone ZTP, Space Network Director also provides a framework for facilitating a ZTP setup and through SNMP traps it can automatically import newly deployed devices into Space).
Another example is the use of the REST API to leverage the Junos Space platform to automate configuration changes to all devices. Consider creating a script that directly connects to all the devices individually. This script would need some way of authentication and manage certificates or passwords for each connection. Using the Space REST API it is possible to perform remote procedure calls (RPC) directly to the managed devices and piggy-back onto the Space DMI connection. Space provides the security and management of the credentials and your script will take advantage of features like job management, auditing, configuration change management and version control.

The Network Director API provides a set of software orchestration services that are exposed through REST APIs. The REST APIs enable network management functions, including:
• Virtualization of cloud and datacenter operations
• Provisioning of secure multitenant networks in a shared network infrastructure
• Support for Layer 2, Layer 3, security, and Internet services
• Support for a single point of integration with external cloud and datacenter orchestration tools
ND-API provides a higher abstraction level than the low level device interface REST API discussed above. This higher abstraction makes it possible to more easily and faster create tools that need to provision multitenant networks in a shared infrastructure. Examples of Juniper tools that leverage this ND-API are the OpenStack Neutron plugin and the CloudStack Network Guru plugin.
Note that ND-API is not included with the Network Director Application but provided as a separate Space app that needs to be installed in addition to ND.

Junos Fusion
Junos Fusion grew out of the Junos Node Unifier (JNU). Junos Node Unifier (JNU) coupled routers and switches in “port extender” and “feature rich” modes. Fusion is the evolution of JNU and brings together a cluster of elements into a single logical entity managed from a supervisor node. Provisioned, managed and configured through the Netconf protocol, it simplifies the network — effectively grouping elements into one, easier to manage, node.
Junos Node Unifier was built entirely in SLAX scripting – at one moment it consisted of over 100,000 lines of code, proving that SLAX is not just a nice to have but an actual tool for automating complex management tasks. Fusion is the integration of JNU as native code in the Junos operating system.
The initial phase of Fusion between Juniper MX and EX products should be visible via CLI in Junos 14.1. In 15.1 router-port extender will become a native Junos feature and support for third party systems via Yang are planned post 15.1.
Fusion ships with Junos and is an integral port of the operating system, no additional licenses required.

JNU 1.3 documentation is available here: http://www.juniper.net/techpubs/en_US/jnu1.3/information-products/topic-collections/design-guide-jnu/junos-node-unifier1.3J1.pdf

OpenFlow in the Data Center

October 2, 2012 Leave a comment

A QFabric perspective on the emerging network virtualization technologies

What is OpenFlow?

Literally quoting the openflow.org website : “OpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.”

In a router or switch, the fast packet forwarding (data plane) and the high level routing decisions (control plane) occur on the same device. In an OpenFlow Switch the data plane portion resides on the switch, while the high-level routing decisions are moved to a separate controller. The communication between the OpenFlow Switch and the OpenFlow Controller uses the OpenFlow protocol.

An OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet that does not match its flow table entries, it sends this packet to the controller. The controller makes the decision on how to handle this packet and adds a flow entry to the switch’s flow table directing the switch on how to forward similar packets in the future.

What is QFabric?

QFabric is a distributed device that creates a single switch abstraction, using a central control plane with smart edge devices representing the data plane. Multiple edge devices are interconnected through a common backplane implemented by 2 or more dedicated interconnect devices. All high-level layer 2 and layer 3 decisions are controlled by a central director (control plane) which supplies the edge devices with information on how to forward packets. Edge devices are smart in a sense that they make their own forwarding decisions for local forwarding while informing the control plane on their local topology and state, and taking input from the central director for making inter edge device forwarding decisions. The communication between the control plane and distributed data planes is implemented using the mature and standardized MBGP protocol (IETF RFC 4760).

The smart edge devices allow for better scalability. The backplane of the distributed device is implemented by very fast interconnects providing the edge devices with a consistent latency between any 2 ports across the whole fabric. Management is abstracted in a central control plane and leaves the administrator with a single switch view.

QFabric is a distributed device, performing and acting like a single switch, implemented using top of rack edge nodes for deployment flexibility. All components of the QFabric are fully redundant and the central control plane provides the single switch abstraction and management view. Using QFabric, it is possible to create one flat network with consistent latency and performance scaling up to 6144 ports.

OpenFlow in the datacenter

OpenFlow is a SDN protocol that I would position as a L2 network virtualization solution in the datacenter world, much like NVGRE, VXLAN, EVB and others. It provides a way to scale beyond the infamous 4095 VLAN restriction imposed by most of the datacenter network hardware in use today.

I see OpenFlow as one of the potential solutions for multitenant cloud datacenters. In public cloud datacenters the primary concern is scaling isolated environments as far as possible, with the option to go well beyond the 4095 VLAN limit. Second concern in multitenant clouds is the dynamic provisioning of services. In an IaaS public cloud for example, it is imperative to have dynamic provisioning of the network layer as new virtual machines are created and deployed – software orchestration using OpenStack, CloudStack, vCloud and others need to integrate with the network and OpenFlow is probably one of the most dynamic solutions available today to achieve this integration easily and in a vendor agnostic way.

Though dynamic in nature, scale is one of the issues that might impact an OpenFlow network. The central controller is the single brain and decision point for all the devices in the network. Except for elephant flows, which are more typical for big data synchronization and backup applications, lots of short lived connections are made across the datacenter network. Any new sessions or flows have to synchronize around the central OpenFlow controller, making this controller the choke point and virtually limiting the scale and performance of the datacenter.

Network BubbleFor flexible deployment of cloud services, when scaling beyond one rack, it is imperative to have a flat network architecture. This flat network architecture can only be implemented by a fabric that provides consistent latency, not favoring flows between two devices located in the same bubble. Traditional network design requires multiple layers to scale, resulting in bubbles at the hardware layer, and location awareness becomes the main restriction for flexible management. See also the IBM Redbook – Build a Smarter Data Center with Juniper Networks QFabric for a more elaborate description of the bubble issue.

The only solution available today, providing a flat and scalable architecture with consistent low latency across all ports and racks is QFabric. This requirement becomes even more apparent when thinking about OpenFlow. OpenFlow delivers dynamic provisioning for a virtual network across physical servers. In order to not suffer a management nightmare and having to confine virtual machine motion to only a subset of the network where optimal performance and latency exists between different servers (bubbles), one requires a flat network architecture. This makes Qfabric the best architecture to run OpenFlow, not requiring location awareness and providing true dynamic scalability across the datacenter.

L2 Virtualization in the Datacenter

L2 virtualization provides a solution to the question how to create virtualized networks on top of a physical network infrastructure matching the dynamic nature of server virtualization in datacenters today. Think about a couple of virtual servers, all residing in the same L2 VLAN, but that can move from one physical host to another. How do we handle the dynamic nature of VLANs moving from one physical host to another host on the physical network ports? Look at this as a mapping problem between virtual and physical VLANs, the VLANs known and living in the virtual switch and those known and residing on the physical network ports.

The most obvious approach to this mapping question is to define all used VLANs on every port connected to a physical server. By far the easiest solution, but… as so many times, the easiest not always being the best… The most important issue faced when defining all VLANs as a trunk on every server access port is MAC flooding. Since all VLANs are defined on all physical ports, whenever a MAC address is unknown to the switch, the switch will flood the ARP request to all ports carrying that specific VLAN where this MAC address is supposed to live. This means that all servers will receive ARP requests flooded by the switch for all VLANs, even if the physical server is currently not hosting any virtual machines that participate in this VLAN. As such, there is no issue as the server will drop the unwanted packets; however the packet needs to come up the network interface device driver in software before it is discarded and in doing so will waste CPU cycles that could otherwise be allocated to virtual machines.

The problem described above is more apparent when you have a lot of L2 isolated networks (VLANs) and lots of VMs joining and leaving the L2 network (eg starting, resuming, stopping machines) which is typical for VDI environments. If you have a limited set of VLANs and servers running 24/7, this problem is much less apparent as flooding will be limited.

Another problem is the limitation on the number of VLANs. If you are running a multitenant environment with many customers and allocate one or even more VLANs per customer, your scalability will be limited by the max number of VLANs on traditional networking equipment (max 4095).

To overcome the above limitations, a number of solutions emerged and have been forming and standardizing in the last few months/years. They can be classified in a few different approaches:

  • Dynamic VLAN assignment solutions
  • Layer 2 encapsulation over L3 networks (overlay networks)
  • OpenFlow (which is a solution that might also fit in the overlay networks class, but treated independently)

Dynamic VLAN assignment solutions

Moving VLANs on physical port trunks depending on where virtual machines are active in the physical servers is an easy solution to overcome the flooding issue. Dynamic motion of VLANs can be achieved using software controllers which integrate with virtualization platforms (eg the RESTful/SOAP API provided by VMware) in order gather knowledge about which machines are running on which physical hardware and also which VLANs a particular vSwitch is serving. The software controllers can then dynamically reconfigure the physical VLANs on the switch port trunks. This controller software can be running on a separate server or it can be embedded in the switch’s control plane. Several vendors use this approach today. Arista VM Tracer and Force10 HyperLink are examples of such controller embedded in the switch’s control plane while Juniper provides an application in its Junos Space network management platform called Virtual Control which runs in a separate server.

The emerging Edge Virtual Bridging (EVB ; 802.1Qbg) standard has a component addressing the above described dynamic VLAN assignment through the standardization of a negotiation protocol between the physical switch and the virtual switch. This protocol is called VSI Discovery and Configuration Protocol (VDP) and in my opinion the most elegant solution for small to medium sized cloud datacenters available today.

The market adoption of EVB and VDP is growing, but today it is limited to a few vendors that expressed commitment to this standard. One of which is Juniper on the physical side and on the virtual side there is Open vSwitch and IBM’s Distributed Virtual Switch 5000V. VMWare has not yet expressed interest for this open standard as of yet and is proposing a solution based on a collaboration with Cisco called VXLAN. As such VMWare virtualization deployments need to replace the standard distributed vSwitch by IBM’s 5000V offering to use VDP. KVM and XEN are compatible with Open vSwitch and are compliant as such. Microsoft with Hyper-V has not expressed specific interest yet and is proposing their proper solution based on GRE (see below). In the newest Microsoft Server 2012 however, Hyper-V provides a flexible and extensible virtual switch which allows third parties to code extensions to the switch using WFP and NDIS filter drivers (known as extensible switch extensions). It is certainly imaginable that with time a VDP extension will be available for Hyper-V.

Layer 2 encapsulation across L3 networks

A second approach to the L2 virtualization problem is to create isolated overlay L2 networks between virtual machines on top of an L3 IP based network (MAC-over-IP). Each physical machine hosting multiple virtual machines has only one IP address from the physical network point of view and the virtual switches on the different physical servers create tunnels between themselves for each L2 virtual network. The virtual switches dynamically build the tunnels for each VLAN required by the virtual machines running on the virtual switch, merely creating overlay networks on top of the physical network. As of today, none of the implementations are providing an intelligent MAC distribution mechanism built into their control planes and as such the ARP protocol and MAC flooding mechanism from the physical world are conserved leading to broadcasts and multicasts being encapsulated inside the overlay tunnel, which in turn are translated into L3 multicasts on the physical network. Two very similar solutions are emerging in this area, NVGRE from Microsoft and VXLAN from VMWare/Cisco.

It is imperative that the physical network provides a good and performing L3 multicast implementation in order to transport the overlay L2 networks. The one flat network requirement also stands for this scenario. Avoiding bubbles is imperative to any network virtualization technology used. L3 multicast was taken into account during the design of QFabric, as such QFabric excels at handling multicast and has provides the flat network architecture making it the best choice for overlay virtual networks.


Another emerging solution is provided through OpenFlow. The dynamic characteristic of the per flow forwarding of OpenFlow and the central control plane approach allow for easy management and design of overlay networks between virtual machines on top of a physical infrastructure. The traditional VLAN approach is left and isolation is provided through flows on the OpenFlow Switches.

For OpenFlow to succeed as a L2 virtualization solution in the datacenter, it needs to be present at all levels of the architecture. At the virtual switch level, Open vSwitch supports OpenFlow, NEC has an extension for the Microsoft Hyper-V virtual switch which integrates with its OpenFlow controller and VMWare recently acquired Nicira so OpenFlow might be part of their strategy as well.

At the physical switch level, if no fabric technology has been considered like QFabric, FEX, TRILL or SPB ; it is imperative that all the layers of interconnection support OpenFlow, not only the server access switches.

Juniper supports OpenFlow, is an active contributor in the standards process and plans to bring OpenFlow to QFabric.  As QFabric provides the architecture of choice (one flat network) for an OpenFlow based datacenter implementation, it can uniquely position them in the SDN marketplace. Juniper’s Software Solution EVP, Bob Muglia recently talked with Jim Duffy about Juniper’s SDN strategy here.

QFabric and L2 virtualization

For every L2 virtualization technology known today, the requirement to simplify the datacenter network connecting the physical servers to one flat network stands. If the network consists of multiple hops and inconsistent latency between network ports, the transparent overlay network idea will fail and careful planning and management for location awareness will be required to make this a success.

In all 3 scenarios, today, QFabric comes out as the architecture of choice to support network virtualization. Be it using EVB VDP, L2 overlay networks or OpenFlow; QFabric will provide investment protection whatever direction or technology you decide to run in the future.

IBM and Juniper QFabric

IBM is committed to QFabric and the one flat network architecture as the future of datacenter networking. See here and referring to 2 recent Redbooks that IBM published in this regard:

Considerations on deploying OpenFlow in the Data Center

A flat network that is scalable, performing and provides consistent latency is the foundation for a good network virtualization strategy. If you do not want to be restricted or confined to designing and managing bubbles in your network, this is unavoidable.

If no fabric technology has been considered like QFabric, FEX, TRILL or SPB ; all the layers of interconnection need to support OpenFlow. Make sure all proposed devices support OpenFlow today, from the core to the access including the virtual switch.

When considering storage convergence and especially FCoE, compliance with DCB is required. Best practice is to split off the FCoE in a completely separate VLAN and use DCBx for ease of deployment, avoiding OpenFlow in the storage VLAN. For pure OpenFlow devices, the OpenFlow controller would require some level of insight in the FC world and extensions for QoS in the OpenFlow device.

When planning for OpenFlow, it is best to consider devices that provide “traditional” L2/L3 mode besides pure OpenFlow. It will allow the use of OpenFlow for parts of the datacenter where they have their best use case and at the same time mix in the “traditional” L2/L3 for critical and latency sensitive protocols such as storage (FCoE, iSCSI, NAS).

Troubleshooting OpenFlow based networks can be a daunting task because of the distributed nature of the data and control planes. In QFabric this is solved by providing troubleshooting tools that mirror the traditional troubleshooting available in traditional switches. Check the OpenFlow devices and controllers for troubleshooting tools and options.

Availability and scalability of the OpenFlow controller is another concern that cannot be taken lightly. In case of an unreachable controller, the OpenFlow devices cannot function and in best case fall back to their traditional L2/L3 functionality (if the device is L2/L3 and OpenFlow capable at the same time).

QFabric is fully redundant by design, from the control network up to the directors, nodes and interconnects. When deciding to run OpenFlow you should design with this same redundancy in mind, meaning fully redundant, physically separated out-of-band connectivity between the controller and the OpenFlow devices for the control plane.

Finally, one more point that needs looking into is how the OpenFlow Controller and the OpenFlow Switches handle multicast. As you know, QFabric is designed with multicast in mind and use multicast trees in the interconnect layer. Multicast being one of the foundations of overlay networks for network virtualization.

Cloud computing, the internet of things, consumerization of IT and the Jericho forum

October 2, 2012 Leave a comment

On a daily basis we are interacting countless times with clouds and cloud services: your favorite newspaper freshly delivered to your tablet every morning, archived digital copies of your favorite magazine, your music collection stored and streamed to your iDevice or AVR, on demand archived TV programs and movies on your smart TV, your customer contacts and relation management tools available from any device and anywhere in the world, your private and business schedules consolidated and synced between your tablet, phone, laptop, PC and accessible through your living room smart TV. Digitalization projects the likes of Project Gutenberg, the human genome project and many others, antivirus, anti-spam and web filtering services in the cloud… Just a few of the most popular services that are delivered through cloud computing and hosted in cloud datacenters today.

Cloud computing cannot be considered a hype anymore, the examples are real, putting increasing demands on cloud service providers, and are causing a revolution in IT technology and infrastructure as well as their adoption in datacenters around the globe.

The internet of things is a nice example of what’s to come and what is or will be possible in machine-to-machine interactions. The internet of things can literally change our daily lives in a not so distant future, while at the same time pushing even further the demands on current cloud services. Requirements for faster response times, continuous availability, more bandwidth, wireless connectivity, and privacy will drive cloud services to adopt and look for new technologies in the area of security and network infrastructure. Faster, more reliable, agile and secure infrastructures are key to the advancement of the cloud and its services for the coming years. Telemetry applications like automated acquisition of consumption numbers for water, gas and electricity, fridges connected with supply stores which automatically resupply your stock by submitting online orders, a public transportation system that informs you of the exact time and seat availability of the next bus, tram or train. Healthcare is radically changing through new developments in information technology and futuristic images of remote robotic surgery are becoming a reality, more down to earth are smart bedside terminals providing video on demand, internet, medication tracking, and electronic patient records. An infuse drip can be centrally controlled and monitored through a wireless network and the control services could be running on a virtual server in the datacenter. A smart alarm clock that wakes you in time for your first meeting, taking into account the location of your meeting, anticipating for traffic on a Monday morning. We are close to what once was considered ‘the future’ and we refer to it as ‘the internet of things’. You will find a nice video covering this subject here. Another nice look into the near future can be found in the video ‘A day made of glass…’ and ‘A day made of glass 2’ by Corning .

Whilst watching the above movies, think about the requirements that will be put on bandwidth, connectivity, availability, and security of the cloud services – what we are facing today in cloud datacenters is only the beginning. Some sources reveal that by 2020 between 22 and 50 billion devices will be connected to the internet, corresponding to more than 6 devices for every person on earth (source).

The cloud is driving IT

The evolution of the cloud is driving a revolution in information technology. IT managers and CIOs are facing conflicting demands; they are asked to reduce costs, consumption, resources, and at the same time they face an ever growing demand for new and faster service deployment, reducing complexity and delivering better user experience. Services have ever increasing demands for more bandwidth, lower latencies and better availability. Cloud computing drives the need for agile infrastructures that optimize the total cost of ownership whilst meeting the requirements for security, performance and availability.

Datacenter trends

The need for more cost effective solutions and services has led to the broad adoption of services hosted on the internet, referred to as cloud services. Cloud services allow fast service deployments, capacity on demand and predictable cost. This moving of services from the local or private datacenter to public or private clouds has led to the consolidation of a larger number of smaller datacenters into a smaller number of mega datacenters.

Quote taken from the NetworkWorld article ‘How Cloud Computing Is Forcing IT Evolution’ : “Ron Vokoun, a construction executive with Mortenson Construction, a company that builds data centers, began by noting that the projects his firm is taking on are quickly shifting toward larger data centers. Mortenson is seeing small and medium-size enterprises leaving the business of owning and operating data centers, preferring to leverage collocated and cloud environments, leaving the responsibility for managing the facility to some other entity. The typical project size his firm sees has doubled or quadrupled, with 20,000 square feet the typical minimum.”

The need for efficient use of resources has been driving server virtualization for many years. Moore’s law is still applicable today and every year the number of cores in a CPU and total computing power per server is increasing with a factor 2 or more. This fuels the server virtualization trend and increases the potential average number of virtual machines consolidated per physical server. The adoption of increasingly powerful multi-core servers, higher bandwidth interfaces, and blade servers is increasing the need for faster connectivity. Access switches need to support growing numbers of 1GE and 10GE ports, moving to 40GE or 100GE access ports in the near future, as more and more compute resources are packed into one rack.

New software application architectures are being adopted to improve productivity and business agility. Architectures based on Web services and Service Oriented Architectures (SOA) have reshaped the data center traffic flows increasing the server-to-server bandwidth as opposed to the lighter presentation layer (Web 2.0) which reduces client-to-server traffic. Several sources estimate the server-to-server (east/west) traffic to be 75% of the global traffic. The trend for VDI (Virtual Desktop Infrastructure) will only but affirm this number.

Another trend in modern datacenter networks is storage convergence. Converging your data and storage communications into one network has several obvious advantages, consider the voice/data convergence, also referred to as VoIP, which is commodity in the current day. The storage convergence trend will accelerate through the increasing interface speeds that will become financially viable options for data networks. Consider the 10Gbps Ethernet moving into 40GE and 100GE, while current Fiber Channel (FC) storage interface speeds are 8/16Gbps and soon 32Gbps and you see the opportunity for converging both technologies onto the same interface and network infrastructure. Higher interface bandwidths will ultimately lead to simpler and less expensive cabling of each rack server into a top of rack (ToR) switch with only one or a couple of interfaces, used for both data and storage.

The Jericho Forum

Cloud deployments also require a fresh way at approaching security. The traditional castle model representation of the legacy corporate security network does not provide the agility we need in today’s flexible business application deployment pattern that is based on private, public or hybrid clouds. Since 2005, the Jericho forum has been evangelizing a security architecture and design approach based on de-perimeterization: completely removing the DMZ pattern from the security design and making every server and host secured and hardened to such extend that it could be direct internet facing. Reality shows that total de-perimeterization might be a bit far reaching, still a lot of good ideas and information is contained in the forum evolving into a new model which is considered more adequate to approach cloud security: the hotel model.

The Hotel Model

The hotel model adequately describes the security design pattern that is required in public or private cloud service deployments. The traditional corporate perimeter firewall, modeled by a castle, does not provide the granularity and level of security we require in modern hosted applications. Cloud services require a more granular and profile based access method for a broader range of user profiles. These new requirements are modeled analogous to a Hotel: Anyone can enter the lobby, for more prestigious 5* hotels that are conscious about their customer’s privacy, entering the hotel might require you to show some proof of identity. Once inside the hotel, you are free to roam the public places. After registration with the hotel reception you are provided with one or more electronic keycards which give you and your family access to your own private room(s). Everyone can walk and roam the hotel, but only a person owning the right key has access to his private room. This closely models that security paradigm that we face when deploying new cloud services.

Consumerization of IT

Let’s not forget about the main driver behind most technologies and investments: the user or the consumer. The user will adopt any technology that increases his comfort and makes his daily tasks easier and more convenient. The user or the consumer does not care about the leading edge technology we might be adopting to provide him a basic service. He or she couldn’t care less if we just invested in state-of-the-art infrastructure nor that we converge data, voice and storage into one network.

User experience can be related to many things like availability, performance, ease of use, look and feel, and price. A user is ready to pay more for better experience, not for better or newer technology. If the user can find the same experience at better price elsewhere or a better experience at the same price, the user will move his service. Many examples in the corporate space show the daily struggle by corporate IT to balance limited and reduced budgets whilst providing the best experience for their users. BYOD (Bring Your Own Device) is probably one of the most popular examples where we see this experience clashing with the regulations and policies of corporations. The user likes the ease of use and experience of his tablet/smartphone at home and wants to extend this experience to his daily business. A serious problem faced while transitioning and integrating these devices into our corporate networks and policies is the lack of corporate management and control features built into the devices because these are mainly consumer targeted, leading to what we call the consumerization of the industry. BYOD/A (Bring Your Own Device or App) is a very hot topic and whilst a lot are seeing the advantages, before finding the holy grail of BYOD there are a lot of hurdles and struggles ahead to fully integrate consumer devices in our enterprise networks while assuring the user experience. Nevertheless, CIOs and IT managers see the opportunity in consumerization, leveraging user owned devices and increasing productivity.

Security versus providing a consistent experience for the user, independent of his device (corporate laptop, internet café PC, home PC, tablet, smartphone) and independent of his location (connected to the wired LAN, wireless WLAN, a remote branch, at home over the internet, in the airport using public internet access or on the road using mobile connectivity). This is one of the headaches modern IT managers are facing and were the need arises for dynamic, adapting security infrastructures.

Network Security Orchestration and IF-MAP

A user connecting to the corporate network is not limited to only one device anymore, nor is he likely to use only one type of connectivity. Users come to the office with their smartphones, tablets and their corporate issued laptops. The laptop connects to the wired network while the smartphone and tablet will likely use the corporate wireless network. The same user can be at his desk, joining a meeting in a conference room, take a taxi to the airport, or be in his hotel room. Independent of his device, his location or even his access method, the user is expecting one experience while accessing corporate resources, virtually allowing the user access to corporate data always, anywhere and from any device.

This one experience can only happen if we tie network access control to the identity or the role of the user. A user connecting to the wireless network could use 802.1x and a smart network would provision this wireless access with the VLAN corresponding to the user’s role and his device posture. Whenever a user crosses security zones or protected networks and penetrates deeper into the corporate network towards the data center, his access should be more granularly controlled by dynamic security policies based on his identity, moving away from static security policies that base their authorization solely on the location or connection point  (network segment or subnet) of a device.

User or role based access control independent of user location and taking into account device posture can only be provided through a dynamic, automatically provisioned security policy which is the topic of orchestrated network security or network security orchestration.

The Trusted Computing Group is working on the question of network security orchestration through its Trusted Network Connect architecture and open standards. One of these standards describes the IF-MAP (Interface for Meta Access Points) protocol, which allows flexible and immediate sharing of metadata between IF-MAP clients and a central database server. IF-MAP clients can publish metadata and subscribe to changes on metadata provided by other clients. An example would be a device profiling appliance like Great Bay Beacon, which can detect personality changes of devices on a specific mac address, which publishes a change of personality of a MAC address in the IF-MAP server. A 802.1x access control device like Juniper’s UAC/MAG appliances that are subscribed to the information for this profiling appliance will receive a change notification from the IF-MAP server and will react on this new information by revoking the provided dynamic VLAN and access from the port the client was connected to earlier.

Internet Federation and SAML

IT managers are given the choice of hosting services or applications inside the corporate datacenter (private cloud) and/or leverage on externally hosted and managed infrastructures or applications (public clouds). Combining best of both world results in what we call hybrid clouds. The flexibility, predictable cost, and pay-as-you-grow model of hosted applications provide a very interesting alternative for IT managers giving them more agility, less risk and faster deployment of new services for their users. But once again, leveraging public and hybrid clouds will only be successful if it provides its users one experience. Imperative to this experience when mixing services between clouds is single sign-on and central identity management.

A technology like single sign-on has always been very highly appreciated in the corporate network and has evolved towards the more broad requirements interconnecting and federating dispersed clouds and services hosted in all-over the world. Standards like SAML started to emerge and through its basic requirements, powerful features and flexibility it provides a revolution in the way we use internet services today.

Federation with SAML makes it possible to centralize and host the identities, roles and authorizations of corporate users inside the corporate network and leverage this identity information through federated access control for internal and external services, providing the user a one-time authentication and seamless access to all services whilst keeping a single point of control and management. SAML virtually shares the corporate users’ identities and authorizations across the private and public clouds, in a secure and controlled manner.

Mobile explosion

The consumerization of IT has reached an inflection point in 2011, where the number of mobile devices on the internet surpasses the number of traditional PCs. This has consequences for existing wireless networks having to accommodate the ever increasing number of devices requesting more bandwidth and requiring flexible scalability while assuring the availability and the quality of the experience.  Quality of Service and Class of Service are becoming more widely deployed and require an end-to-end approach. Much like the dynamic security model described above, QoS needs to be granularly provided depending on the user, his device, his location and the application. The latter becomes more apparent as enterprises adopt social networking, start using rich media or online conferencing and unified communication as viable means of business.

Application Identification

The dynamic nature of orchestrated security networks providing security and quality of service to the user needs to be extended with the intelligence of the application being carried across the infrastructure. Recent trends the likes of next generation firewalling are provider a deeper look into the protocols and datagrams carried by the network. Where traditionally IDP was used for deeper application intelligence and security, this same technology has been leveraged and reused by next generation firewall devices to provide application level firewalling and quality of service giving the administrator the possibility to dynamically influence the network based on the application. And the trend is going on, stepping one level deeper into what is called nested applications like Farmville inside Facebook and InMail inside Linkedin. Take the Facebook example: marketing departments are keen in leveraging social media and they appreciate employees making positive references to corporate events and announcements, at the same time IT does not this same user to be playing Farmville and disrupt or slow down more business critical application flows. Traffic control, classification and prioritization based on applications and nested applications give the administrator and the network the required tools to take back control and provide that so important quality of experience for the user.

About the blog

We can only but admit that the cloud revolution is accelerating and fueling our economic growth and general wellbeing. An acceleration that requires high paced technology evolutions, disruptive approaches to existing infrastructures and adoption of new models and patterns in the networking industry. It is happening all around us today and will not slow down in the coming years. This blog aims to cover some of these evolutions and revolutions and will explore some of solutions available today and their implications on networking today and in the near future.

I’m not a fortune teller, neither do I have the knowledge of the world, and at times I might make some assumptions or cut some corners to shorten the technical explanations. While at times I might be biased, being passionate about my employer’s technology and having generally a better knowledge and insight about the accuracy of the information on the products we carry in our portfolio, I am trying to be open minded and as accurate as possible, given the available information. In any circumstance, I will accept and I encourage feedback – positive and negative! I admit that I’m on a continuous learning curve, like everyone in information technology…

%d bloggers like this: