Is Your Cloud Always Operational? The Quest for Nearly Perfect Availability

The Enterprise Cloud needs to be always operational for our customers. No vacations, no extended upgrade or maintenance windows, no single points of failure. Just always be operational.

There are a lot of services that we are lucky enough to take for granted. Electricity and phone service, and maybe even cellular data and wireless Internet. The first two of those most of us think of as always operational. We’re lucky to have that first-world luxury. I recall that even in the midst of the rolling electricity blackouts we lived through in California a number of years ago, we always had a dial-tone when we picked up our landline phone. That’s the platinum-service availability that we seek.

Not Good Enough

What might be good enough up-time for departmental Software-as-a-Service (SaaS) is not for the entire enterprise. If HR or a sales tool goes down, there’s certainly some associated pain, but generally the business can still operate. Not so with the Enterprise Cloud that hosts the services that drive the fundamental business and operations—such as the IT infrastructure that supports every department.

Our cloud infrastructure is built using two redundant, fully active data-center regions in each of our geographies (16 regions in eight geographies). Each data center is staffed 24×7 and features the highest standards for reliable power, fire suppression and physical security. In each data center, ServiceNow has dedicated physical space, all under our control. The only humans that physically touch any of our infrastructure, including the servers that hold your data and the network gear that connects our cloud to the Internet are full-time ServiceNow employees.

Speaking of our network, we have dual top-of-rack routers in every server rack that connect to a fully redundant distribution and core network. The network between our paired data centers in each region are built using diverse carriers and redundant private fiber circuits. At the borders to our network, we connect to multiple Tier1 Internet Service Providers and also connect to multiple public peering points across the planet, such as Amsterdam Internet Exchange (AMS-IX). To provide customer-facing networking and security services, we have redundant, stateful firewalls, intrusion-detection systems and server load-balancers.

Redefining Cloud Availability

Beyond our network, the servers that we use to build our cloud are designed with redundancy and availability in mind. Each server has dual power supplies, multiple network interface cards that connect to top-of-rack routers and local storage. Our latest database servers use solid-state drives to further increase performance and availability. Your data is backed up in each data center and because of our multi-instance architecture your data is never commingled with another customer’s data. Because your instance is active in both data centers in each geography, your data is backed up in both data centers in a region, such as in both California and Virginia.

Our focus on availability and redundancies goes even further – we store spare equipment for many critical infrastructure components in each location or we have contracted with our vendors to provide four-hour or shorter replacement parts. Even if we are waiting for a part to arrive we don’t incur an outage as everything is redundant. If we feel that operating without redundancy for a short period of time could result in an outage for your instance we have automation that can quickly move your instance to the paired data center in the region, a feature of our cloud that we call Advanced High Availability.

The last piece of this quest is consistency. We build and operate each of our data centers, networks and servers in the same manner. Every location has the same architecture. This consistency greatly helps our operations teams understand and resolve issues quickly. Those operations teams are watching over our cloud 24x7x365 from London, San Diego and Sydney using a follow-the-sun support model.

With all of this consistency and redundancy in place, we’re working to set the industry’s most aggressive standards for a highly available cloud infrastructure. Yet, we recognize that our goal can be nearly unachievable. Even with the absolute best infrastructure engineering there can be downtime. While we are not perfect, maintaining the availability of our customers’ instances is the absolute top priority for our teams.

We are not done, and will never be. We will continue to set new standards and provide greater assurance for our customers who rely on us to perform billions of transactions every month on the Enterprise Cloud.

Allan Leinwand
Allan Leinwand has built a reputation for managing the world’s most demanding clouds – in B2B and B2C. He is the chief technology officer at ServiceNow responsible for building and running the ServiceNow Enterprise Cloud – the second largest enterprise cloud computing environment on the planet. In this role, he is responsible for overseeing all technical aspects and guiding the long-term technology strategy for the company. Before joining ServiceNow, Leinwand was chief technology officer – Infrastructure at Zynga, Inc. where he was focused on building one of the largest consumer cloud computing environments used in the delivery of the company’s social games to more than 80 million players daily. He got his start as a cloud pioneer at Cisco before “cloud computing” was a term and the idea of accessing applications from anywhere was still very new. In addition to expertise in running large enterprise cloud computing environments, he also provides expertise in software engineering, quality engineering and product-market fit to companies including Spoke, Inc.; Bulletproof 360, Inc.; MapAnything, Inc.; Founders Circle Capital; and Kleiner Perkins Caufield & Byers. He is a Board member of Marin Software. Leinwand has served as an adjunct professor at the University of California, Berkeley where he taught computer networks, network management and network design. He holds a bachelor of science degree in computer science from the University of Colorado at Boulder.

Leave a Reply Text

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.