Verkada Reduces Dev Machine Costs by over 70% with Crafting
“Crafting helps us achieve efficiency goals while improving manageability and user experiences.”
Verkada is a market leader of enterprise building security solutions, offering a comprehensive range of products including video cameras, access control systems, and environment sensors. With their cloud-managed approach, they have become a trusted partner for over many thousands enterprise organizations, enabling safer and smarter operations in their buildings.
One key differentiator of the solutions Verkada provides is that their solution is cloud-based and end-to-end whole stack. This unique approach ensures a seamless user experience for their enterprise customers. However, this approach also presents significant challenges for their engineering team, as they strive to develop cutting-edge cloud services that meet stringent security requirements. Additionally, the team is tasked with creating sophisticated firmware as part of their IoT solution.
The development of Verkada’s sophisticated cloud services necessitates substantial computational power, surpassing the capabilities of local laptops used by their developers. Moreover, engineers often encounter the need to build and test code on multiple architectures (x86 and ARM), rendering a single local machine inadequate for their work. To address these challenges, the engineering team has embraced the approach of leveraging on-cloud development machines.
Specifically, engineers can allocate online virtual machines (AWS EC2 instances) for their development needs. While this approach has gained popularity and widespread adoption, certain disadvantages associated with using barebone EC2 VMs have surfaced.
The primary issue is resource efficiency and VM cost. Many engineers inadvertently leave their VMs running for extended periods, ranging from days to weeks, as they often neglect to shut them down after use. Additionally, engineers are hesitant to terminate their VMs because doing so erases the productive setup they have meticulously configured. There are instances where the VMs already contain partially completed work. When faced with blocks or interruptions due to other tasks, engineers are compelled to retain the allocated VMs to preserve valuable progress. Consequently, the cost of dev machines has steadily risen over time and has become a significant component of the engineering expenditure. “We really need to optimize the dev machine usage and control the cost as part of the company-wide push for efficiency.”, says Kevin Chen, the Engineering Manager for the dev tools team, “It’s one of our top priorities.”
Managing barebone EC2 VMs for development purposes poses a heavy maintenance burden. These VMs are designed for production systems, lacking statefulness and relying on higher-level orchestration systems. When issues arise, manual intervention is usually required for recovery, leading to time-consuming efforts.
Additionally, the difficulty of enforcing secure engineering practices raises compliance concerns. Lacking better support, engineers sometimes have to store secret credentials on these EC2 machines, which poses a challenge in managing access to these machines to safeguard the secrets. Maintaining secure development environments demands significant effort and vigilance from the infrastructure team.
To address the challenges they faced, the Verkada team adopted Crafting as their primary online development environment following an extensive trial.
With Crafting, engineers can now spin up on-demand sandboxes as their dev environment whenever they need one. The sandboxes are launched as containers and managed by EKS, where Crafting provides an end-to-end managed solution taking care of all operational tasks.
Verkada eliminated the resource waste with Crafting’s cost-management solution, which combines auto-scaling for the VM nodes and auto-suspension for the containers. Crafting diligently monitors user activities within the sandboxes and automatically suspends idle sandboxes after 30 minutes, freeing up CPU and memory resources. Despite this automated suspension, the user experience remains seamless, with all local file changes preserved. Developers can quickly resume their work from where they left off, typically within 1-2 minutes. Crafting also scales down the VM node pool when the overall load is low, e.g., during the night hours, to achieve the cost saving, and scales up the pool when the overall load picks up, e.g. morning time.
Further savings and better experiences were brought to the engineers when the team smartly chose to use larger VM nodes for the sandbox node pool. Two main factors lead to the savings. Firstly, the development containers within the sandboxes have access to and can share all available resources on the VM node. Secondly, the engineers' typical resource usage, such as building and testing, is characterized by intermittent spikes and staggered patterns. Most of their time is spent on code editing, which requires minimal resources. By using larger nodes, they achieved higher peak performance and reduced the busy time of CPU and memory resources. As a result, engineers enjoy shorter turnaround times, while the platform team benefits from increased resource efficiency through enhanced resource sharing on larger VMs.
Verkada engineers frequently rely on running Docker during their bazel build process in their mono-repo, and accessing cloud resources within their development environments, and Crafting effectively supports these requirements. Crafting enables nested containers, allowing users to seamlessly execute "docker run" commands inside the container-based sandboxes. Moreover, Crafting's Identity Federation solution eliminates the need for individual users to store their cloud credentials on the development machines. This solution provides the infrastructure team with full auditability of who accessed specific cloud resources and when, enhancing security and access control measures.
We benefit from this partnership a lot and we are getting a ton of great feedback from the engineering teams about how they enjoyed Crafting!
Verkada successfully addressed the resource inefficiency of their previous EC2-based approach by transitioning to Crafting, resulting in a remarkable 70% reduction in dev machine costs. This impressive efficiency gain is attributed to several factors within Crafting's cost-management solution, including resource sharing among containers, auto-suspension during idle time, and auto-scaling on the node pools.
By leveraging Crafting, they were able to unlock broader usage of online development environments for a larger number of engineers across multiple sub-organizations. Previously, cost constraints had hindered such usage expansion. The notable aspect is that the efficiency scales with increased usage, as resource sharing becomes more significant with a higher number of users. Verkada's leadership is delighted with the achieved results and encourages further adoption of the Crafting system to continue improving efficiency and achieving even more impressive outcomes.
Our CFO is impressed with the savings and we feel really proud as the dev infra team to be able to carry out the company level efficiency goals while improving developer experience.
In addition to the cost savings achieved, Verkada experienced significant improvements in the end-user experience through their adoption of Crafting. Now, when an engineer inadvertently breaks the configuration on their sandbox, they can effortlessly create a new sandbox without concerns about additional resource costs. The launch time for the new development environment has significantly improved, and it comes pre-installed with all the necessary tools and configurations, managed by the dev tools team. “Crafting gives me control on how to use my devbox, I won’t need to worry about losing my work and I can get a new one anytime I want.”
Furthermore, the maintenance of online dev environments has become considerably easier. On the system level, the whole Crafting platform, hosted on Verkada’s cloud, is remotely managed by the Crafting team for product updates. And the node pool is auto configured and scaled without the internal team paying attention. The administrative team only needs to manage a template with snapshots to maintain standardized configurations and keep libraries up-to-date. Individual engineers no longer need to exert extra effort for updates. The administrative team can directly update any sandbox from an older version to a new one, ensuring that all necessary vulnerability patches are applied. “Security is a top priority for us, Crafting helps us make it easier to ensure a high standard there.”
Kubernetes development plays a key role in Verkada’s dev flow. Currently each engineer's sandbox can easily access the services running in the internal Kubernetes cluster for testing. In the future, the internal dev tools team is developing towards a more sophisticated and resource efficient way to test their services in production-like Kubernetes environments leveraging traffic routing features provided by Crafting for Kubernetes.
Verkada is a market leader of enterprise building security solutions from video cameras, access control systems, to environment sensors. With a growing engineering team of more than 400 people, they help over 17,000 enterprise organizations operate safer, smarter buildings with their cloud-managed security solutions.
- Engineers use EC2 VMs to get enough computation power required by their dev work
- Many VMs are left idle due to negligence or keeping the work-in-progress states
- Dev machines increase overtime and become a notable cost item
- Maintenance and recovery requires significant manual efforts
- Difficult to enforce secure practices for engineers, e.g., control access to secrets
- Switched to Crafting Sandbox as the primary on-cloud dev environments
- Eliminated resource waste from activity-based auto-suspension and node pool auto-scaling
- Achieve better performance and experience by sharing more powerful machines
- Leverage identity federation to directly access AWS from sandbox
Achieved a significant reduction in dev machine costs, surpassing 70% savings