Challenge
Founded in 2005, the enterprise content management company allows its more than 50 million users to manage content in the cloud.
Over the past couple of years, Box has been decomposing its infrastructure into microservices, and became an early adopter of, as well as contributor to,
"Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Today, a new microservice takes less than five days to deploy. And we're working on getting it to an hour." A platform that allows its more than 50 million users (including governments and big businesses like
Box's cloud native journey accelerated that June, when Ghods attended
At that conference, Google announced the release of its Kubernetes container management system, and Ghods was won over. "We looked at a lot of different options, but Kubernetes really stood out, especially because of the incredibly strong team of
Another plus: Ghods liked that
Box deployed Kubernetes in a cluster in a production data center just six months later. Kubernetes was then still pre-beta, on version 0.11. They started small: The very first thing Ghods's team ran on Kubernetes was a Box API checker that confirms Box is up. "That was just to write and deploy some software to get the whole pipeline functioning," he says. Next came some daemons that process jobs, which was "nice and safe because if they experienced any interruptions, we wouldn't fail synchronous incoming requests from customers." The first live service, which the team could route to and ask for information, was launched a few months later. At that point, Ghods says, "We were comfortable with the stability of the Kubernetes cluster. We started to port some services over, then we would increase the cluster size and port a few more, and that's ended up to about 100 servers in each data center that are dedicated purely to Kubernetes. And that's going to be expanding a lot over the next 12 months, probably too many hundreds if not thousands." While observing teams who began to use Kubernetes for their microservices, "we immediately saw an uptick in the number of microservices being released," Ghods notes. "There was clearly a pent-up demand for a better way of building software through microservices, and the increase in agility helped our developers be more productive and make better architectural choices." Ghods reflects that as early adopters, Box had a different journey from what companies experience now. "We were definitely lock step with waiting for certain things to stabilize or features to get released," he says. "In the early days we were doing a lot of contributions [to components such as kubectl apply] and waiting for Kubernetes to release each of them, and then we'd upgrade, contribute more, and go back and forth several times. The entire project took about 18 months from our first real deployment on Kubernetes to having general availability. If we did that exact same thing today, it would probably be no more than six." In any case, Box didn't have to make too many modifications to Kubernetes for it to work for the company. "The vast majority of the work our team has done to implement Kubernetes at Box has been making it work inside of our existing (and often legacy) infrastructure," says Ghods, "such as upgrading our base operating system from RHEL6 to RHEL7 or integrating it into
Perhaps the bigger challenge for Box was a cultural one. "Kubernetes, and cloud native in general, represents a pretty big paradigm shift, and it's not very incremental," Ghods says. "We're essentially making this pitch that Kubernetes is going to solve everything because it does things the right way and everything is just suddenly better. But it's important to keep in mind that it's not nearly as proven as many other solutions out there. You can't say how long this or that company took to do it because there just aren't that many yet. Our team had to really fight for resources because our project was a bit of a moonshot." Having learned from experience, Ghods offers these two pieces of advice for companies going through similar challenges: Service discovery was a huge problem for Box, and the team had to decide whether to build an interim solution or wait for Kubernetes to natively satisfy Box's unique requirements. After much debate, "we just started focusing on delivering something that works, and then dealing with potentially migrating to a more native solution later," Ghods says. "The above-all-else target for the team should always be to serve real production use cases on the infrastructure, no matter how trivial. This helps keep the momentum going both for the team itself and for the organizational perception of the project." Early on, the team built an abstraction on top of Docker files to help ensure that images had the right security updates. This turned out to be superfluous work, since container images are considered immutable and you can easily scan them post-build to ensure they do not contain vulnerabilities. Because managing infrastructure through containerization is such a discontinuous leap, it's better to start by interacting directly with the native tools and learning their unique advantages and caveats. An abstraction should be built only after a practical need for it arises. In the end, the impact has been powerful. "Before Kubernetes," Ghods says, "our infrastructure was so antiquated it was taking us more than six months to deploy a new microservice. Now a new microservice takes less than five days to deploy. And we're working on getting it to an hour. Granted, much of that six months was due to how broken our systems were, but bare metal is intrinsically a difficult platform to support unless you have a system like Kubernetes to help manage it." By Ghods's estimate, Box is still several years away from his goal of being a 90-plus percent Kubernetes shop. "We're very far along on having a mission-critical, stable Kubernetes deployment that provides a lot of value," he says. "Right now about five percent of all of our compute runs on Kubernetes, and I think in the next six months we'll likely be between 20 to 50 percent. We're working hard on enabling all stateless service use cases, and shift our focus to stateful services after that." In fact, that's what he envisions across the industry: Ghods predicts that Kubernetes has the opportunity to be the new cloud platform. Kubernetes provides an API consistent across different cloud platforms including bare metal, and "I don't think people have seen the full potential of what's possible when you can program against one single interface," he says. "The same way
Ghods points to projects that are already in development or recently released for Kubernetes as a cloud platform: cluster federation, the Dashboard UI, and CoreOS's etcd operator. "I honestly believe it's the most exciting thing I've seen in cloud infrastructure," he says, "because it's a never-before-seen level of automation and intelligence surrounding infrastructure that is portable and agnostic to every way you can run your infrastructure." Box, with its early decision to use bare metal, embarked on its Kubernetes journey out of necessity. But Ghods says that even if companies don't have to be agnostic about cloud providers today, Kubernetes may soon become the industry standard, as more and more tooling and extensions are built around the API. "The same way it doesn't make sense to deviate from Linux because it's such a standard," Ghods says, "I think Kubernetes is going down the same path. It is still early days—the documentation still needs work and the user experience for writing and publishing specs to the Kubernetes clusters is still rough. When you're on the cutting edge you can expect to bleed a little. But the bottom line is, this is where the industry is going. Three to five years from now it's really going to be shocking if you run your infrastructure any other way."Solution
Impact