Introduction

Cavisson Systems’ NetHavoc is a unique implementation of core and advanced chaos engineering concepts that span the entirety of infrastructure, network and application layers. Conducting chaos experiments across these layers allows DevOps, SRE, QE and Development teams to accurately assess the system’s and application’s resilience. With an extensive integration with notification and ITSM tools, NetHavoc provides a ready enterprise level implementation of a highly matured chaos engineering platform.

NetHavoc vs Harness CE

NetHavoc offers a wide variety of chaos experiments across infrastructure, network and application levels. In-built integration with Cavisson’s cutting edge performance testing and observability solution makes it as simple as a matter of clicks to analyze the impact of chaos experiments with production level load on end user experience by capturing actual user sessions

and viewing the performance of your application across individual transaction(s) /page(s)

/session(s).

NetHavoc provides chaos experiments or havocs across a wider application landscape than Harness’s current support, which enables development teams to identify areas of improvement from a code level perspective in an increasingly diverse and evolving development framework ecosystem.

Detailed Comparison

The following section provides an in-depth comparison between NetHavoc and Harness CE. The differentiators are highlighted to easily identify features/components that set NetHavoc apart from Harness CE.

[wptb id=10051]

[wptb id=10052]

* Microservice design patterns like Circuit Breaker and Bulkhead can be tested with these features.

[1]: Harness does not directly induce delay, instead, it uses an intermediate proxy server. Further, the delay is induced at the node/pod level. Delay on individual business transactions/services is not supported.

On the contrary, NetHavoc does not need a proxy, it injects delay directly inside the JVM. Also, the delay can be injected at a more granular level i.e. on individual business transactions/services.

[2]: Same as 1 above

[wptb id=10053]

[wptb id=10054]

[wptb id=10055]

[wptb id=10056]

[wptb id=10057]

[wptb id=10058]

Platform Specific Chaos Experiments Coverage

Cavisson’s NetHavoc provides extensive chaos experiment capabilities spanning application and infrastructure levels and with its support over multiple on-premise, cloud and containerized platforms, it offers a clear distinction over Harness. Let’s assess the chaos experiments supported over these aforementioned platforms:

GCP

Harness has extremely limited chaos experiment capabilities for GCP with disk loss and instance stop as the only chaos experiments supported. On the other hand, NetHavoc provides extensive infrastructure and application level chaos experiments or havocs on applications running on GCP.

Azure

Harness provides infrastructure level and a single application level chaos experiments on Azure. For applications, the only chaos experiment supported is to restrict access to an application instance. NetHavoc provides a wider variety of chaos experiments across the application level to accurately determine an application’s resilience in production level scenarios.

AWS

Harness provides a wider level of infrastructure and AWS service related chaos experiments but ends up lacking at the application level. Furthermore, Harness does not provide native observability for AWS. NetHavoc, apart from providing multiple application level chaos experiments, enables organizations with detailed, in-built monitoring of various AWS services which facilitates a 360-degree view of the distributed application ecosystem to better understand the extent of an experiment’s impact on both the application and infrastructure resiliency. Moreover, additional infrastructure/service level AWS chaos experiments are planned in the upcoming quarters as part of the product roadmap with the aim to provide an unmatched coverage for AWS via NetHavoc.

Kubernetes

Harness provides a larger number of chaos experiments for Kubernetes as compared to NetHavoc, but, at the infrastructure level. NetHavoc provides a more granular approach at the application level where users can inject havoc(s) at individual transaction/service. As with AWS, a wider range of observability metrics for Kubernetes is provided in NetHavoc when compared to Harness. Having this level of detailed insight is essential to understanding the impact of your resiliency testing initiatives on the application and its underlying components in a micro-service oriented application landscape. Container, node, pod and control plane level metrics are all covered under Cavisson’s native observability, thus giving organizations a comprehensive insight into each component’s preparedness during outages.

Pivotal Cloud Foundry/Tanzu Application Service

Harness has a single chaos experiment available for PCF/TAS whereas NetHavoc provides both system and application level chaos experiments along with in-built monitoring capabilities for applications deployed on Cloud Foundry. The monitoring module covers numerous integral cloud foundry services like Auctioneer, Nozzle, GoRouter, Controller, File Server amongst others to provide a holistic, all-round view of how your system & application responds to chaos experiments.

Linux

As observed with different platforms, Harness provides chaos faults only at the system level in Linux whereas NetHavoc’s chaos experiment capabilities covering both the application and infrastructure layers along with supporting experiments for Kafka and JMS based MQs.

Windows

NetHavoc provides resource and application level chaos experiments/havoc(s) for Windows in both VM and On-Premise format. Harness, on the other hand, does not support any chaos experiment for on premise windows OS based machines, and has resource level chaos experiments that are limited to Windows OS based VMWare VMs.

Due to this constraint, organizations with on premise Windows servers cannot utilize Harness and would require additional chaos experiment tools to carry out resiliency testing of their critical Windows based application(s)/infrastructure.

Conclusion

NetHavoc allows organizations and teams to conduct chaos experiments in conjunction with production-level traffic and extensive observability capabilities across applications, user sessions, logs, and infrastructure. Traditional methodologies of calculating resiliency scores with negligible observability insights and without appropriate user load falls way short of accurately depicting your mission critical application’s resiliency.

The above diagram illustrates how a unifying signal across various components (load, chaos experiments & observability) is fundamentally required to accurately drill down to the exact root cause behind issues being observed after conducting chaos experiments. Without this common signal, it becomes virtually impossible for traditional chaos engineering tools to gauge the extent and duration of KPI degradation without integrating multiple tools for application, user experience & log monitoring along with performance testing solutions.

Ease of implementing application level chaos experiments via a single click byte code instrumentation of bundled libraries enables teams to understand how resilient their application is. Coupled with post experiment analysis capabilities like:

One click pattern matching to identify all affected metrics across your entire ecosystem, from network, server, database, user experience as well as transaction level allows you to instantly understand what was the impact of the chaos experiment.
Coupled with in-built AIOps capabilties, identifying the exact root cause behind performance degradation or un-availability becomes a matter of minutes and not days.
NetHavoc’s exhaustive coverage of pre and post experiment capabilities drastically improves your team’s & organization’s posture towards creating resistance to failure and minimize downtime.

Providing an extensive array of chaos experiment capabilities across the infrastructure and application layer becomes essential to accurately judge your IT ecosystem’s resiliency. Without this level of experiments spanning the entire spectrum, organizations cannot be prepared for outages seen in production as their resiliency preparedness remains limited.

Cavisson Systems’ NetHavoc elevates resiliency testing to resiliency engineering, assisting organizations and teams in realigning their focus on staying ahead of the competition instead of spending a massive amount of time figuring out the what, and why behind critical issues. Current tools are not adept at providing this level of insight and correlation, hence falling way short of actually ensuring that your mission critical applications are resilient enough to handle unplanned outages in production.

Contact us today to view NetHavoc’s cutting edge capabilities and elevate your end user experience by building resistance to failure.