Posts Tagged ‘cloud computing’

Mice and Elephants in my Data Center

September 8, 2014 1 comment

Elephant flows

A long-lived flow transferring a large volume of data is referred to as an Elephant flow. Compared to the smaller sized, short lived flows referred to as ‘mice’ and of which we have numerous in the data center. Elephant flows, though not that numerous, can monopolize network links and will consume all allocated buffer for a certain port. This can cause temporal starvation of mice flows and disrupt the overall performance of the data center fabric. Juniper will soon introduce Dynamic Load Balancing (DLB) based on Adaptive Flowlet Splicing as a configuration option in its Virtual Chassis Fabric (VCF). DLB provides a much more effective load balancing compared to a traditional hash-based load distribution. Together with the existing end-to-end path-weight based load balancing mechanism, VCF has a strong load distribution capability that will help network architects drive their networks harder than ever before.

Multi-path forwarding

Multi-path forwarding refers to the balancing of packets across multiple active links between 2 network devices or 2 network stages. Consider a spine and leaf network with 4 spines, all traffic from a single leaf is spread across all the links in order to use as much as possible the available aggregate bandwidth and provide redundancy in case of link failure. 

Multi-path forwarding is typically based on a hash function. A hash function maps a virtually infinite collection of data into a finite or limited set of buckets or hashes, as illustrated below.

img1Image source: Wikipedia (

In networking terms: the hash function will take multiple fields of the Ethernet, IP and TCP/UDP header and use these to map all sessions onto a limited collection of 2 or more links. Because of the static nature of the fields used for the hashing function, a single flow will be mapped onto exactly 1 link and stay there for the lifetime of the flow. A good hashing function should be balanced, meaning that it should equally fill all hashing buckets and by doing so provide an equal distribution of the flows across the available links.

One of the reasons to use static fields for the hashing function is to avoid reordering of packets as they travel through a network when paths might not all be of equal distance or latency. Even if by design all paths are equal, different buffer fill patterns on different paths will cause differences in latency. Reordering can be provided by the end-point or in the network, but it always comes at a cost which is why a network ensuring in-order delivery is preferred.

Because of that static nature, the distribution of packets will be poor balanced when a few flows are disproportionally larger than the others. A long-lived, high volume flow will be mapped to a single link for the whole of its life-time and will cause the network buffer of that link to get exhausted with packet drops as a result.

TCP as in keeping the data flowing

To understand the mechanism of Adaptive Flowlet Splicing, we need to understand some of the dynamics of how data is transmitted through the network. TCP has been architected to try to avoid network congestion and keep a steady flow of data over a wide range of networks and links. One provision that enables this is the TCP window size. The TCP window size specifies how much data can be in-flight in the network before expecting a receiver acknowledgement. The TCP Window pretty much tells the sender to blindly send a number of packets and when an acknowledgement is received from the receiver the sender can slide the window down for the size of one packet for each received ‘ack’. The size of the window is not fixed but dynamic and self-tuning in nature. TCP uses what we call AIMD (Additive Increase, Multiplicative Decrease) congestion control. In AIMD congestion control, the window size is increased additively for each acknowledgement received and cut by half whenever a few acknowledgements were missed (indicating loss of packets). The resulting traffic pattern is the typical saw-tooth:

img2 Adaptive Flowlet Splicing (AFS)

From the above it should be apparent that elephant flows will result in repeating patterns of short bursts followed by a quiet period. This characteristic pattern divides the single long-lived flow over time into smaller pieces, which we refer to as ‘flowlets’. Below picture, courtesy of the blog article by Yafan An [*1], visually represents what flowlets look like within elephant flows when we look at them through a time microscope:

img3Now suppose that the quiet times between the flowlets are larger than the biggest difference in latency between different paths in the network fabric, in that case load balancing based on flowlets will always ensure in-order arrival.

To distribute the flowlets more evenly across the member links of a multi-path, it would be good to keep some kind of relative quality measure for each link depending on its history. This measure is implemented using the moving average of the link’s load and its queue depth. Using this metric, the least utilized and least congested link among the members will be selected for assigning new flowlets.

Is this elephant flow handling unique to VCF ?

By all means, no. But the controlled environment and the imposed topology of the VCF solution allows Juniper to get the timings right without having to resort to heuristics and complex analytics. In a VCF using only QFX5100 switches in a spine and leaf topology, each path is always 3 hops. The latency between two ports across the fabric is between 2µs and 5µs resulting in a latency skew of max 3µs. By consequence, any inter-arrival time between flowlets larger than the 3µs latency skew will allow flowlets to be reassigned member links without impacting the order of arrival of packets.

img4In an arbitrary network topology using a mix of switch technologies and vendors, every variable introduced makes it exponentially more complex to get the timings right and find the exact point where to split the elephant flow into adequate flowlets for distributing them across multiple links.

Another problem we did not have to address, in the case of VCF, is how to detect or differentiate an elephant from a mice. AFS records the timestamp of the last received packet of a flow. In combination with the known latency skew of 3µs, the timestamp is enough to provide the indicator for the reassignment of the flow to another member link. It is less important for AFS to be aware of the flow’s actual nature.

In arbitrary network architectures however, as Martin Casado and Justin Pettit describe in their blog post ‘Of Mice and Elephants’ [*2], it might be helpful to be able to differentiate the elephants from the mice and have them treated differently. Whether this should be done through distinct queues or different routes for mice and elephants or turning the elephants into mice, or some other clever mechanism is a topic of debate and network design. Another point to consider is where to differentiate between them? The vswitch is certainly a good candidate, but in the end the underlay will be the one that handles the flows according to their nature and hence a standardized signaling interface between overlay and underlay must be considered.


By introducing AFS in VCF, data center workloads that run on top of the data center fabric will be distributed more evenly and congestion on single paths that might be caused by elephant flows will be avoided. If a customer has no needs for a certain topology or for a massively scalable solution, a practical and effective solution like VCF brings a lot of value to their data centers.


[*1] Yafan An, Flowlet Splicing – VCF’s Fine-Grained Dynamic Load Balancing Without Packet Re-ordering –

[*2] Martin Casado and Justin Pettit – Of Mice and Elephants –



Cloud computing, the internet of things, consumerization of IT and the Jericho forum

October 2, 2012 Leave a comment

On a daily basis we are interacting countless times with clouds and cloud services: your favorite newspaper freshly delivered to your tablet every morning, archived digital copies of your favorite magazine, your music collection stored and streamed to your iDevice or AVR, on demand archived TV programs and movies on your smart TV, your customer contacts and relation management tools available from any device and anywhere in the world, your private and business schedules consolidated and synced between your tablet, phone, laptop, PC and accessible through your living room smart TV. Digitalization projects the likes of Project Gutenberg, the human genome project and many others, antivirus, anti-spam and web filtering services in the cloud… Just a few of the most popular services that are delivered through cloud computing and hosted in cloud datacenters today.

Cloud computing cannot be considered a hype anymore, the examples are real, putting increasing demands on cloud service providers, and are causing a revolution in IT technology and infrastructure as well as their adoption in datacenters around the globe.

The internet of things is a nice example of what’s to come and what is or will be possible in machine-to-machine interactions. The internet of things can literally change our daily lives in a not so distant future, while at the same time pushing even further the demands on current cloud services. Requirements for faster response times, continuous availability, more bandwidth, wireless connectivity, and privacy will drive cloud services to adopt and look for new technologies in the area of security and network infrastructure. Faster, more reliable, agile and secure infrastructures are key to the advancement of the cloud and its services for the coming years. Telemetry applications like automated acquisition of consumption numbers for water, gas and electricity, fridges connected with supply stores which automatically resupply your stock by submitting online orders, a public transportation system that informs you of the exact time and seat availability of the next bus, tram or train. Healthcare is radically changing through new developments in information technology and futuristic images of remote robotic surgery are becoming a reality, more down to earth are smart bedside terminals providing video on demand, internet, medication tracking, and electronic patient records. An infuse drip can be centrally controlled and monitored through a wireless network and the control services could be running on a virtual server in the datacenter. A smart alarm clock that wakes you in time for your first meeting, taking into account the location of your meeting, anticipating for traffic on a Monday morning. We are close to what once was considered ‘the future’ and we refer to it as ‘the internet of things’. You will find a nice video covering this subject here. Another nice look into the near future can be found in the video ‘A day made of glass…’ and ‘A day made of glass 2’ by Corning .

Whilst watching the above movies, think about the requirements that will be put on bandwidth, connectivity, availability, and security of the cloud services – what we are facing today in cloud datacenters is only the beginning. Some sources reveal that by 2020 between 22 and 50 billion devices will be connected to the internet, corresponding to more than 6 devices for every person on earth (source).

The cloud is driving IT

The evolution of the cloud is driving a revolution in information technology. IT managers and CIOs are facing conflicting demands; they are asked to reduce costs, consumption, resources, and at the same time they face an ever growing demand for new and faster service deployment, reducing complexity and delivering better user experience. Services have ever increasing demands for more bandwidth, lower latencies and better availability. Cloud computing drives the need for agile infrastructures that optimize the total cost of ownership whilst meeting the requirements for security, performance and availability.

Datacenter trends

The need for more cost effective solutions and services has led to the broad adoption of services hosted on the internet, referred to as cloud services. Cloud services allow fast service deployments, capacity on demand and predictable cost. This moving of services from the local or private datacenter to public or private clouds has led to the consolidation of a larger number of smaller datacenters into a smaller number of mega datacenters.

Quote taken from the NetworkWorld article ‘How Cloud Computing Is Forcing IT Evolution’ : “Ron Vokoun, a construction executive with Mortenson Construction, a company that builds data centers, began by noting that the projects his firm is taking on are quickly shifting toward larger data centers. Mortenson is seeing small and medium-size enterprises leaving the business of owning and operating data centers, preferring to leverage collocated and cloud environments, leaving the responsibility for managing the facility to some other entity. The typical project size his firm sees has doubled or quadrupled, with 20,000 square feet the typical minimum.”

The need for efficient use of resources has been driving server virtualization for many years. Moore’s law is still applicable today and every year the number of cores in a CPU and total computing power per server is increasing with a factor 2 or more. This fuels the server virtualization trend and increases the potential average number of virtual machines consolidated per physical server. The adoption of increasingly powerful multi-core servers, higher bandwidth interfaces, and blade servers is increasing the need for faster connectivity. Access switches need to support growing numbers of 1GE and 10GE ports, moving to 40GE or 100GE access ports in the near future, as more and more compute resources are packed into one rack.

New software application architectures are being adopted to improve productivity and business agility. Architectures based on Web services and Service Oriented Architectures (SOA) have reshaped the data center traffic flows increasing the server-to-server bandwidth as opposed to the lighter presentation layer (Web 2.0) which reduces client-to-server traffic. Several sources estimate the server-to-server (east/west) traffic to be 75% of the global traffic. The trend for VDI (Virtual Desktop Infrastructure) will only but affirm this number.

Another trend in modern datacenter networks is storage convergence. Converging your data and storage communications into one network has several obvious advantages, consider the voice/data convergence, also referred to as VoIP, which is commodity in the current day. The storage convergence trend will accelerate through the increasing interface speeds that will become financially viable options for data networks. Consider the 10Gbps Ethernet moving into 40GE and 100GE, while current Fiber Channel (FC) storage interface speeds are 8/16Gbps and soon 32Gbps and you see the opportunity for converging both technologies onto the same interface and network infrastructure. Higher interface bandwidths will ultimately lead to simpler and less expensive cabling of each rack server into a top of rack (ToR) switch with only one or a couple of interfaces, used for both data and storage.

The Jericho Forum

Cloud deployments also require a fresh way at approaching security. The traditional castle model representation of the legacy corporate security network does not provide the agility we need in today’s flexible business application deployment pattern that is based on private, public or hybrid clouds. Since 2005, the Jericho forum has been evangelizing a security architecture and design approach based on de-perimeterization: completely removing the DMZ pattern from the security design and making every server and host secured and hardened to such extend that it could be direct internet facing. Reality shows that total de-perimeterization might be a bit far reaching, still a lot of good ideas and information is contained in the forum evolving into a new model which is considered more adequate to approach cloud security: the hotel model.

The Hotel Model

The hotel model adequately describes the security design pattern that is required in public or private cloud service deployments. The traditional corporate perimeter firewall, modeled by a castle, does not provide the granularity and level of security we require in modern hosted applications. Cloud services require a more granular and profile based access method for a broader range of user profiles. These new requirements are modeled analogous to a Hotel: Anyone can enter the lobby, for more prestigious 5* hotels that are conscious about their customer’s privacy, entering the hotel might require you to show some proof of identity. Once inside the hotel, you are free to roam the public places. After registration with the hotel reception you are provided with one or more electronic keycards which give you and your family access to your own private room(s). Everyone can walk and roam the hotel, but only a person owning the right key has access to his private room. This closely models that security paradigm that we face when deploying new cloud services.

Consumerization of IT

Let’s not forget about the main driver behind most technologies and investments: the user or the consumer. The user will adopt any technology that increases his comfort and makes his daily tasks easier and more convenient. The user or the consumer does not care about the leading edge technology we might be adopting to provide him a basic service. He or she couldn’t care less if we just invested in state-of-the-art infrastructure nor that we converge data, voice and storage into one network.

User experience can be related to many things like availability, performance, ease of use, look and feel, and price. A user is ready to pay more for better experience, not for better or newer technology. If the user can find the same experience at better price elsewhere or a better experience at the same price, the user will move his service. Many examples in the corporate space show the daily struggle by corporate IT to balance limited and reduced budgets whilst providing the best experience for their users. BYOD (Bring Your Own Device) is probably one of the most popular examples where we see this experience clashing with the regulations and policies of corporations. The user likes the ease of use and experience of his tablet/smartphone at home and wants to extend this experience to his daily business. A serious problem faced while transitioning and integrating these devices into our corporate networks and policies is the lack of corporate management and control features built into the devices because these are mainly consumer targeted, leading to what we call the consumerization of the industry. BYOD/A (Bring Your Own Device or App) is a very hot topic and whilst a lot are seeing the advantages, before finding the holy grail of BYOD there are a lot of hurdles and struggles ahead to fully integrate consumer devices in our enterprise networks while assuring the user experience. Nevertheless, CIOs and IT managers see the opportunity in consumerization, leveraging user owned devices and increasing productivity.

Security versus providing a consistent experience for the user, independent of his device (corporate laptop, internet café PC, home PC, tablet, smartphone) and independent of his location (connected to the wired LAN, wireless WLAN, a remote branch, at home over the internet, in the airport using public internet access or on the road using mobile connectivity). This is one of the headaches modern IT managers are facing and were the need arises for dynamic, adapting security infrastructures.

Network Security Orchestration and IF-MAP

A user connecting to the corporate network is not limited to only one device anymore, nor is he likely to use only one type of connectivity. Users come to the office with their smartphones, tablets and their corporate issued laptops. The laptop connects to the wired network while the smartphone and tablet will likely use the corporate wireless network. The same user can be at his desk, joining a meeting in a conference room, take a taxi to the airport, or be in his hotel room. Independent of his device, his location or even his access method, the user is expecting one experience while accessing corporate resources, virtually allowing the user access to corporate data always, anywhere and from any device.

This one experience can only happen if we tie network access control to the identity or the role of the user. A user connecting to the wireless network could use 802.1x and a smart network would provision this wireless access with the VLAN corresponding to the user’s role and his device posture. Whenever a user crosses security zones or protected networks and penetrates deeper into the corporate network towards the data center, his access should be more granularly controlled by dynamic security policies based on his identity, moving away from static security policies that base their authorization solely on the location or connection point  (network segment or subnet) of a device.

User or role based access control independent of user location and taking into account device posture can only be provided through a dynamic, automatically provisioned security policy which is the topic of orchestrated network security or network security orchestration.

The Trusted Computing Group is working on the question of network security orchestration through its Trusted Network Connect architecture and open standards. One of these standards describes the IF-MAP (Interface for Meta Access Points) protocol, which allows flexible and immediate sharing of metadata between IF-MAP clients and a central database server. IF-MAP clients can publish metadata and subscribe to changes on metadata provided by other clients. An example would be a device profiling appliance like Great Bay Beacon, which can detect personality changes of devices on a specific mac address, which publishes a change of personality of a MAC address in the IF-MAP server. A 802.1x access control device like Juniper’s UAC/MAG appliances that are subscribed to the information for this profiling appliance will receive a change notification from the IF-MAP server and will react on this new information by revoking the provided dynamic VLAN and access from the port the client was connected to earlier.

Internet Federation and SAML

IT managers are given the choice of hosting services or applications inside the corporate datacenter (private cloud) and/or leverage on externally hosted and managed infrastructures or applications (public clouds). Combining best of both world results in what we call hybrid clouds. The flexibility, predictable cost, and pay-as-you-grow model of hosted applications provide a very interesting alternative for IT managers giving them more agility, less risk and faster deployment of new services for their users. But once again, leveraging public and hybrid clouds will only be successful if it provides its users one experience. Imperative to this experience when mixing services between clouds is single sign-on and central identity management.

A technology like single sign-on has always been very highly appreciated in the corporate network and has evolved towards the more broad requirements interconnecting and federating dispersed clouds and services hosted in all-over the world. Standards like SAML started to emerge and through its basic requirements, powerful features and flexibility it provides a revolution in the way we use internet services today.

Federation with SAML makes it possible to centralize and host the identities, roles and authorizations of corporate users inside the corporate network and leverage this identity information through federated access control for internal and external services, providing the user a one-time authentication and seamless access to all services whilst keeping a single point of control and management. SAML virtually shares the corporate users’ identities and authorizations across the private and public clouds, in a secure and controlled manner.

Mobile explosion

The consumerization of IT has reached an inflection point in 2011, where the number of mobile devices on the internet surpasses the number of traditional PCs. This has consequences for existing wireless networks having to accommodate the ever increasing number of devices requesting more bandwidth and requiring flexible scalability while assuring the availability and the quality of the experience.  Quality of Service and Class of Service are becoming more widely deployed and require an end-to-end approach. Much like the dynamic security model described above, QoS needs to be granularly provided depending on the user, his device, his location and the application. The latter becomes more apparent as enterprises adopt social networking, start using rich media or online conferencing and unified communication as viable means of business.

Application Identification

The dynamic nature of orchestrated security networks providing security and quality of service to the user needs to be extended with the intelligence of the application being carried across the infrastructure. Recent trends the likes of next generation firewalling are provider a deeper look into the protocols and datagrams carried by the network. Where traditionally IDP was used for deeper application intelligence and security, this same technology has been leveraged and reused by next generation firewall devices to provide application level firewalling and quality of service giving the administrator the possibility to dynamically influence the network based on the application. And the trend is going on, stepping one level deeper into what is called nested applications like Farmville inside Facebook and InMail inside Linkedin. Take the Facebook example: marketing departments are keen in leveraging social media and they appreciate employees making positive references to corporate events and announcements, at the same time IT does not this same user to be playing Farmville and disrupt or slow down more business critical application flows. Traffic control, classification and prioritization based on applications and nested applications give the administrator and the network the required tools to take back control and provide that so important quality of experience for the user.

About the blog

We can only but admit that the cloud revolution is accelerating and fueling our economic growth and general wellbeing. An acceleration that requires high paced technology evolutions, disruptive approaches to existing infrastructures and adoption of new models and patterns in the networking industry. It is happening all around us today and will not slow down in the coming years. This blog aims to cover some of these evolutions and revolutions and will explore some of solutions available today and their implications on networking today and in the near future.

I’m not a fortune teller, neither do I have the knowledge of the world, and at times I might make some assumptions or cut some corners to shorten the technical explanations. While at times I might be biased, being passionate about my employer’s technology and having generally a better knowledge and insight about the accuracy of the information on the products we carry in our portfolio, I am trying to be open minded and as accurate as possible, given the available information. In any circumstance, I will accept and I encourage feedback – positive and negative! I admit that I’m on a continuous learning curve, like everyone in information technology…

%d bloggers like this: