BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Debugging Distributed Systems: Q&A with the “Squash” Microservice Debugger Creator Idit Levine

Debugging Distributed Systems: Q&A with the “Squash” Microservice Debugger Creator Idit Levine

Key Takeaways

  • The ability to monitor and debug an application is important during development and in production. Debugging a microservice-based application is more challenging than debugging a monolithic application, as it is difficult to attach a native debugger to multiple processes that communicate across a network.
  • Currently the best approach to debugging microservices relies on obtaining a trace of all transactions and dependencies using tools that, for example, implements the OpenTracing API standard. These tools capture timing, events, and tags, and collect the associated data out-of-band (asynchronously).
  • OpenTracing tools are very powerful, but they have limitations and gaps. Since logging the state of the application during runtime can be expensive and result in performance overhead, one needs to limit the amount of collected information. 
  • Squash in an open source microservice debugging tool that orchestrates run-time debuggers attached to microservices (running within containers deployed onto IaaS or CIaaS), and provides familiar features like setting breakpoints, stepping through the code, viewing and modifying variables etc
  • We should aspire to provide distributed applications the same level of observability and control that is available for monolithic applications. A service mesh may be the future best point of integration for such observation, for example, logging, tracing and in-process debugging
     

InfoQ recently sat down with Idit Levine, CEO of solo.io and creator of the new open source “Squash” microservices debugger, and discussed the challenges of observing and debugging distributed systems and applications.

InfoQ: Hi Idit, and welcome to InfoQ! Could you introduce yourself, and discuss a little about your latest venture solo.io please?

Levine: Hi Daniel, thank you for having me. I am the founder and CEO of solo.io, whose general mission is to streamline the cloud stack. I’ve been in the cloud management space for 12 years, since I’ve joined DynamicOps (the developer of vCAC, later acquired by VMware) as one of its first employees.

Most recently I was the CTO of the cloud-management division at EMC. There I led, designed and implemented project unik, an open source platform for automating Unikernels compilation and deployment, and project layer-x, an open source framework for cross-cluster scheduling.

Solo.io is currently in stealth mode, but my commitment to the open source community is as strong as ever. That’s why we recently released Squash, an open source platform for debugging microservices applications.  We plan to enhance Squash and bring other valuable tools to the community in the near future.

InfoQ: Can you explain a little about how operational and infrastructure monitoring has evolved over the last five years? How have cloud, containers, and new architectural styles like microservices impacted monitoring and debugging?

Levine: Monitoring the state of an application is important during development and in production. With a monolithic application, this is rather straightforward, since one can attach a native debugger to the process and have the ability to get a complete picture of the state of the application and its evolution.

Monitoring a microservice-based application poses a greater challenge, particularly when the application is composed of tens or hundreds of microservices. Due to the fact that any request may involve being processed by many microservices running multiple times -- potentially on different servers -- it is exceptionally difficult to follow the “story” of the application and identify the causes of problems when they arise.

Currently, the main methodology relies on obtaining a trace of all transactions and dependencies using tools that, for example, implement the OpenTracing standard. These tools capture timing, events, and tags, and collect this data out-of-band (asynchronously). OpenTracing allows users to perform critical path analysis and monitor request latency, perform topological analysis and identify bottlenecks due to shared resources. Users can also log what they think could be useful data, like the values of different variables, error messages etc.

InfoQ: We've been keenly watching the evolution of Squash -- an open source tool that allows the debugging of microservices application running on container orchestration from IDE -- and would be keen to hear the goals of the project and rationale for creating this?

Levine: OpenTracing tools are very powerful, but they have limitations and gaps. Since logging the state of the application during runtime can be expensive and result in performance overhead, one needs to limit the amount of collected information. One way to do this is to follow only a subset of the transactions, and not all of them. Tuning the size of this sample represents a tradeoff between the amount of information collected on one hand, and the price in performance and costs on the other.

One consequence is that once a problem is identified, it is possible that some needed information is missing. Obtaining this information requires running the application again, and waiting for the data to be collected. Moreover, OpenTracing is not a runtime debugger and does not allow changing variables during runtime to explore potential solutions to a problem. Any attempt to fix a problem requires wrapping the code, running the application, and waiting for the data again. Solving a problem may necessitate several such iterations, which can be both daunting and expansive.

Our vision for Squash is to complement the OpenTracing tools and close these gaps. The main goal of Squash is to provide an efficient tool for debugging microservices applications. Squash orchestrates run-time debuggers attached to microservices, providing familiar features like setting breakpoints, stepping through the code, viewing and modifying variables etc. Importantly, Squash allows the developer to seamlessly follow the application and skip between microservices. Squash takes care of all the necessary piping, allowing developers to focus on their own code and solve the issues they actually care about. To make Squash accessible and easy to adopt, it integrates with existing popular IDEs.

Squash is designed to provide essential capabilities for monitoring the life cycle of an application both in the development phase, allowing development of robust code, as well as during production, allowing fast adaptation of the code when new difficulties arise.

InfoQ: What are the future plans for Squash?

Levine: We recognized that Squash can leverage a service mesh (like Istio) and proxy (like Envoy) to let users debug application that run in the mesh without pausing the entire service. Accordingly, we’ve just officially pushed Squash http envoy filter to Envoy upstream. Next, we will work with the Istio team to configure this project to use it.

We have received community requests to integrate Squash with more platforms, like Mesos and Docker Swarm, and we hope to also integrate it with Cloud Foundry. We have also added support for more debuggers, like Java, Node js and Python. Lastly, we are looking forward to support more IDEs, including IntelliJ IDEA and Eclipse.

In addition, we are talking with the OpenTracing-community leader, with the aim to integrate OpenTracing with Squash. The vision is that users would be able to identify latency between two services via OpenTracing, and zoom-in to resolve the problem with Squash.

InfoQ: We've seen you talk about Unikernels, and would be keen to get your opinion on the role this technology will play in the future? Bryan Cantrill has famously stated that Unikernels are unfit for production, and are also entirely undebuggable. What do you think about this?

Levine: I believe that Unikernels will play a significant role in the future, mainly in the IoT space. The benefits of Unikernels – their “slim” footprint, security, performance – are a great fit to IoT devices where the storage is limited and one prefers to include minimal code rather than a full-blown OS.

I believe unik is a fantastic orchestration tool to build and run a Unikernel, and it seems that the community agrees based on the traffic and clones on the GitHub repository. I am very happy that people are using unik. Next, I hope to extend unik to be more than a Unikernel tool, by supporting Kata Containers, LinuxKit, FreeRTOS and other IoT embedded device software.

Bryan is absolutely right that Unikernels can only be production ready when monitoring and debugging tools for Unikernels become available.  Currently, such tools do not exist.

When we built unik we had to debug the Unikernel, and we did that using the gdb debugger. I can therefore testify that debugging Unikernels is indeed possible, but can be extremely hard.

I think that the community, which recognizes the huge potential of Unikernels, should invest in creating new tools that will automate this process and make it easier. Squash, for example, is already leveraging debuggers like gdb, so potentially it could be expanded to help debugging Unikernels.

InfoQ: "Serverless" technology is also getting increasingly popular, and would a tools like Squash also be useful for debugging applications/functions deployed here?

Levine: Definitely! Actually, we originally thought of Squash as a tool for debugging serverless applications. However, most people who run serverless apps today use the public cloud FaaS platforms -- and for good reasons, as this is currently the most mature offering. Such platforms take the complexity away from the user, but also take away the control and flexibility.

Users do not have any control or access to the environment that the functions run on. This really limit the ability of the community to innovate in the serverless space, and forces it to come up with hacks and “creative” solutions to overcome its limitations. I am not a fan of “hacks”, and therefore when we built Squash we gave priority to platforms that provide us with the hooks to plug into.

InfoQ: What other tools do you think future developers will need to understand and debug large-scale, rapidly evolving container-based applications?

Levine: As a community, we should aspire to provide distributed applications the same level of observability and control that is available for monolithic applications. A combination of existing tools already points us in the right direction. Log collection can be done by OpenTracing tools, metrics collected by Prometheus, and debugging by Squash. All of these methods should plugin to a service mesh to achieve full efficiency.

InfoQ: What role do you think QA/Testers have in relation to observability and debuggability of a system?

Levine: In one possible mode of action, I would expect the QA and testers to focus on the logs and provide context. With container-based applications, this should be done using OpenTracing. The developer will then be able to reproduce the bug and use Squash to attach a debugger, step through the code, and resolve the issue.

InfoQ: Thanks once again for taking the time to sit down with us today. Is there anything else you would like to share with the InfoQ readers?

Levine: We at solo are working hard of building more open source tools to facilitate microservices development and operation. In particular, we are focused on innovative and helpful tools to accelerate adoption of microservices in the enterprise. We are super excited about our plans for 2018 -- please stay tuned!

Additional information on solo.io can be found at the company website, and the open source Squash microservices debugger can be found on GitHub.

About the Interviewee

Idit Levine is the founder and CEO of solo.io, a Boston-based startup whose mission is to streamline the cloud stack.  Idit has been in the cloud management space for 12 years, working at both enterprise and startup companies. Until recently Idit was the CTO of the cloud management division at EMC and a member of its global CTO Office, where she and her  team introduced successful open-source projects for automating unikernels (UniK) and for cross-cluster scheduling (layer-x). At solo, Idit recently released Squash, an open source platform for debugging microservices applications.

Rate this Article

Adoption
Style

BT