Challenge
"We realized that we needed a more cloud-native approach to deal with these issues," says Singh. The team decided to implement a custom setup of Kubernetes based on Ansible. "Kubernetes has helped us reduce the deployment time from 15 minutes to less than a minute," says Singh. "Due to Kubernetes's self-healing nature, the operations team doesn't need to do any manual intervention in case of a node or pod failure." Plus, he says, "Dev-Prod parity has improved since developers can experiment with options in dev/staging clusters, and when it's finalized, they just commit the config changes in the respective code repositories. These changes automatically get replicated on the production cluster via CI/CD pipelines." For most content creators, only half of that movie quote may ring true. Sure, platforms like Wordpress, YouTube and Shopify have made it simple for almost anyone to start publishing new content online, but attracting an audience isn't as easy. Crowdfire "helps users publish their content to all possible places where their audience exists," says Amanpreet Singh, a Software Engineer at the company based in Mumbai, India. Crowdfire has gained more than 16 million users—from bloggers and artists to makers and small businesses—since its launch in 2010. With that kind of growth—and a high demand from users for new features and continuous improvements—the Crowdfire team struggled to keep up behind the scenes. In 2015, they moved their monolith Java application to Amazon Web Services
It was a good first step, but the team soon realized they needed to go further down the cloud-native path, which would lead them to Kubernetes. "It was okay for our use cases initially, but as the number of services and development teams increased and we scaled further, deploy times, self-healing capabilities and resource utilization started to become problematic," says Singh, who leads the infrastructure team at Crowdfire. "We realized that we needed a more cloud-native approach to deal with these issues." As he looked around for solutions, Singh had a checklist of what Crowdfire needed. "We wanted to keep some things separate so they could be shipped independent of other things; this would help remove blockers and let different teams work at their own pace," he says. "We also make a lot of data-driven decisions, so shipping a feature and its iterations quickly was a must." Kubernetes checked all the boxes and then some. "One of the best things was the built-in service discovery," he says. "When you have a bunch of microservices that need to call each other, having internal DNS readily available and service IPs and ports automatically set as environment variables help a lot." Plus, he adds, "Kubernetes's opinionated approach made it easier to get started." There was another compelling business reason for the cloud-native approach. "In today's world of ever-changing business requirements, using cloud native technology provides a variety of options to choose from—even the ability to run services in a hybrid cloud environment," says Singh. "Businesses can keep services in a region closest to the users, and thus benefit from high-availability and resiliency." So in February 2016, Singh set up a test Kubernetes cluster using the kube-up scripts provided. "I explored the features and was able to deploy an application pretty easily," he says. "However, it seemed like a black box since I didn't understand the components completely, and had no idea what the kube-up script did under the hood. So when it broke, it was hard to find the issue and fix it." To get a better understanding, Singh dove into the internals of Kubernetes, reading the docs and even some of the code. And he looked to the Kubernetes community for more insight. "I used to stay up a little late every night (a lot of users were active only when it's night here in India) and would try to answer questions on the Kubernetes community Slack from users who were getting started," he says. "I would also follow other conversations closely. I must admit I was able to avoid a lot of issues in our setup because I knew others had faced the same issues."Solution
Impact