Transcript
Ruiz: My name is Ix-chel Ruiz. I'm a Java champion. I work for JFrog. I have collaborated in some books, the most recent one is, "DevOps Tools for Java Developers." Let's start with the DevOps part. I would like to share with you the 2021 State of DevOps Report. This is the tenth report prepared by Puppet based on more than 2600 responses from all around the world. The results are clear. Organizations practicing DevOps consistently report more frequent deployments, shorter lead times to change, lower change failure rates, and faster mean time to recover. They also self-define their evolution in their DevOps transformation from high to low. For eight years, highly evolved DevOps teams have consistently demonstrated better performance across four key software performance metrics, deploying to production, on demand. Reporting change lead times and mean times to recover, under one hour. Change fail rates, under 5%. Far too many organizations reach a plateau in their DevOps evolution. This has been a consistent trend of a stagnation.
Luckily, plenty of improvements can be accomplished in two areas, platform and cultural initiatives. At the twin layer of the platform, increasing the self-service and seamless integration between different tools used on the software development cycle increases the adoption of DevOps practices faster. Highly evolved firms make heavier use of internal platforms from their engineers, enabling developers to access authentication, container orchestration, service-to-service authentication, tracing and observability, and logging requests. It is important that the process and the platform is well-defined, integrated, and easily available for all teams to adopt. One of the areas that still present the most challenges is the new culture adoption. To create a mechanism to entangle teams, it is important to create initiatives to promote a culture of knowledge sharing. Teams who share common tooling, language, or methodologies can actively share best practices with other teams, faster and more effectively. Lastly, all teams require a clear understanding of the IT infrastructure landscape.
DevOps and Java
On the other hand, as Java developers, we have some advantages that we can leverage. For example, a very healthy ecosystem with mature libraries for testing, metrics, observability, and whatnot, and build tools with scanning capabilities. We developers are consistently focused in two main things, improving the quality of the software that we build, and trying to release more valuable features in each release version. Even more, we know releasing a new version is a routine operation, where a consistent process can be followed. If we already are embracing the cultural change brought by the Agile development style, adding new methodologies like DevSecOps, shift left, will enable the optimization of the entire software development process: build, test, release, deploy, monitor, and observe the application in production.
DevSecOps
What is DevSecOps? DevSecOps is a set of security assessments where we have a lot of tools of practices in different categories. For example, static application security testing. Tools in these categories can source code for known weaknesses and insecure coding practices, code smells. Software composition analysis tools analyze software to detect known software components such as open source and third party libraries, and identify any associated vulnerabilities. SCA complements SAST by finding vulnerabilities not detectable by scanning source code. Dynamic application security testing, DAST, scans applications in runtime. This enables an outside-in approach to testing all these applications for exploitable conditions that were not detectable in a static state. Web application firewalls monitor traffic at the application level, and detect potential attacks, and attempts to exploit vulnerabilities. Container image scanning tools can continuously and automatically scan container images with the C within the CI/CD pipeline and in container registries. Cloud security posture management solutions identify misconfigurations in cloud infrastructure. Finally, shift left. Shift left only brings testing and security measures into the code development process as early as possible, so moving towards us, the developers.
Testing and Security
I'm passionate about testing. Even when I have been advocating quite vocally about the benefits of testing in all these flavors, unit integration, contract, UI, end-to-end, REST API, acceptance and exploratory. Sometimes it's easy to disregard their importance. Security. In the last years, very dramatic vulnerabilities have brought more attention to secure dependencies early on the development cycle, and involving us, not only the Q&A, or security teams. There is another dimension that we haven't mentioned so far. For those of us that have decided to move towards microservices, we stand at the overlap of an architectural style, which introduces the idea of multiple services, while bringing security and testing concerns at an earlier stage of the development cycle of each one of them. Martin Fowler once described microservices as an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms. Built around business capabilities and independently deployable by fully automated deployment machinery. Maybe with a bare minimum of centralized management of these services, and they may be written in different programming languages. The challenge is improving the quality of the software, releasing more valuable features in all aspects of the process, from requirement specification, documentation, architecture, testing, security, automation, to collaboration between different tools are multiply. Micro, mini, or small services potentially written in different languages evolving at different rates as whole different products, and communicating between them, so changing some API contracts.
Tooling
We really need tools that make all that overhead easier to manage. Let's talk about tools, or even better, let's first discuss, how do we manage our contracts? Because our APIs are contracts of communication between our different services, it doesn't matter the size. If we're using one of the most difficult protocols of communication, REST, then, how do we define document version, deprecate, or even show some of our examples to our different consumers? I strongly suggest you use the OpenAPI specification. It has really good tools for maintaining, publishing your documentation, and even creating automatic mocks or testing, verifying, and generating code for the client or the server. It is really interesting or important that you start using standards.
On the security part, bringing everything closer to the developer, I want to show you some of my favorite tools. Frogbot is a GitBot, actually, you can use it with GitLab, GitHub, or Bitbucket. It has different types of functionality. The one that I like the most is, one, you open a pull request, it automatically scans your pull request for known vulnerabilities. In case that exists, it will create a report telling you which component where the vulnerability is found, and even if there is a version that actually fixes this problem. Even before you are merging the code into your repo, you have all this information at your fingertips. Another one is the stock Docker extension that scans your Docker images and exactly provides you with a very cool report with all the vulnerabilities, again, the version, the known, and even advice from our security team. Our IDEs, in this case, I'm showing you an IntelliJ IDEA plugin. JFrog plugin that actually does the same thing, it scans your dependencies. In this case, it's a Maven project. In your IDE, it will tell you the vulnerabilities, the version, which components, even the reports.
We are lucky in the Java world, we have adopted the microservice architecture style with gusto. There are several frameworks out there that support microservices. For example, the Spring Boot, Quarkus, Micronaut, Dropwizard, among others. They provide their own testing libraries, or leverage known libraries like JUnit, Hamcrest, Mockito, AssertJ, or REST Assured.
WireMock
WireMock is a simulator for HTTP based APIs, service virtualization tool or a mock server. Runs in a standalone process, without the HTTP server, or even in Docker. Selective proxying requests through the other host. Matching criteria can be used. Has record, replay. You can simulate faults, or define stateful behaviors. In version 2.32.0 released last December, the team introduced the ability to run WireMock without needing the HTTP server for a serverless deployment model.
REST Assured
REST Assured is a Java DSL for simplifying testing of REST and specifying request data, for example, path parameters, cookies, header, multi-value parameters. Also, verifying response data with ease, cookies, status, pattern matching, body, content in different formats, measuring responses. Supports authentication, OAuth1, OAuth2, and Spring support. In version 4.5.0, they upgraded Groovy from 3.0.8 to 3.0.9.
Testcontainers
Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium browsers, or anything that can run in a Docker container. It will create all the containers as we have defined and while your tests are running, all of these will run correctly. As soon as they have finished, it will actually properly dispose of your resources. In version 1.6.3, they introduced the K3s modules for testing Kubernetes components.
Testing, Monitoring, and Observability
Even if we have a healthy number of unit integration, contract, end-to-end, REST APIs, acceptance and exploratory tests, we are still in a controlled and well-defined world, bound by our own imagination and assumptions of what could possibly happen in production. Without that, things may be a little bit different, or completely. How, where, what, how long, how fast is defined by our imagination, beliefs, technical capabilities, and assumptions. Sometimes we are not really testing, or we really don't have a clue what is going to happen out there. Sometimes we need to test in production. I know it sounds so wrong, we should say observe closely our services in production, and understand better the system state using a predefined set of metrics and logs. Monitoring application lets us detect failures. Monitoring is crucial for analyzing long trends, provides information on how the services are growing and how they are being utilized. Observability originated from control theory, measures how well you can understand a system's internal state from its external outputs. Observability uses instrumentation to provide insights that a monitoring and observable system allow us to understand and measure the internals, helping us figure out the cause from the effects.
Traces, Metrics, and Logs
There are three base pillars, traces. Traces track the progression of a single request. That is a trace. It's handled by a service that make up an application. A request may be initiated by a user or an application. Distributed testing is a form of tracing that transfers process, network, and security boundaries. Metric is a measurement about a service, captured at runtime. Logically, the moment of capturing one of these measurements is known as a metric event, which consists not only of the measurement itself, but the time that it was captured, with all the associated metadata. A log is a timestamped text report, either structured, recommended, or unstructured with metadata. While logs are an independent data source, they may also be attached to spans.
The Cloud Native Computing Foundation
Now let's talk about tools for monitoring. I believe in open source. I'm promoting standards in the industry. Most of the time, I will join the efforts that foster and sustain an ecosystem of open source projects or tools that implement standards, hence enters the CNCF, the Cloud Native Computing Foundation. The Cloud Native Computing Foundation seeks to drive adoption of technologies and techniques by fostering and sustaining an ecosystem of open source, vendor-neutral projects with technologies necessary to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. For example, container, service meshes, microservices, immutable infrastructure, and declarative APIs, by focusing on techniques that enable loosely coupled systems to be resilient, manageable, and observable with robust automation. The goal is to allow engineers to make high impact changes frequently unpredictable with minimal tools.
Kubernetes
One of the projects that is most famous from the CNCF as a graduated project is Kubernetes. Let's talk about scheduling and orchestration. In orchestration, probably you are using Kubernetes in your projects. It's an open source graduated project of the CNCF, mostly written in Go. In the CNCF, you will have all these really nice cards displaying what is the composition, where you can find the different projects, the license, everything. For example, in observability and analysis, we have all these tools available, OpenMetrics, Prometheus. Inside tracing, we have Zipkin, Jaeger, or OpenTelemetry. In logging, Grafana Loki is what Prometheus is for monitoring. Loki is for logging.
Prometheus
You probably have also encountered Prometheus in your own projects. This is an open source monitoring system developed by engineers at SoundCloud in 2012. It was the second project accepted in the CNCF foundation after Kubernetes and also the second one to graduate. The Prometheus monitoring system includes a rich multi-dimensional data model, a consistent powerful query language, an efficient embedded time-series database, and over 150 integrations with third party systems. My only word of advice is cardinality is key.
OpenMetrics
OpenMetrics creates an open standard for transmitting cloud native metrics at scale. It acts as an open standard from Prometheus. It was created in 2017. Since then, OpenMetrics has published a stable version 1.0, the specification that is used in production by many large enterprises: GitLab, DoorDash, Grafana Labs. OpenMetrics is primarily a wire format independent of any particular transport for that format. The format is expected to be consumed on a regular basis and to be meaningful over successive exposition. This standard expresses all system states as numerical values, counts, current values, enumeration, and Boolean states. Singular events occur in a specific time.
OpenTelemetry
OpenTelemetry is more than just a new way to visualize data across applications. This project aims to change how we use instrumentation without requiring a change in monitoring tools. It is a collection of tools and features designed to measure software performance. It is an amalgamation of two open source projects, OpenTracing and OpenCensus. The CNCF developed OpenTracing to provide a vendor agnostic standardized API for tracing. OpenCensus was the internal traceability platform from Google that later evolved into an open source standard. OpenTelemetry is an incubating project that combines the strength of both of these standards to form a unifying traceability standard that is both vendor and platform agnostic. It is now available for use across different platforms and environments. It provides APIs and SDKs among other tools to measure, collect telemetry data for distributed and cloud native application, and allow exporting the data to other visualization tools. If you go right now to the CNCF page, you will have the whole entire ecosystem of all the projects in each one of the categories.
Questions and Answers
Losio: I'm a quite old Java developer. I don't consider myself a Java developer anymore, because too many years that I don't write enough code to pretend to be a devoted Java developer. I definitely agree with you that the ecosystem in terms of tools and options for Java developers is definitely more mature than for other platforms. I was wondering if there's instead, anything that as a Java developer, actually, in the DevOps space is actually missing. Things that from other languages from other technology, you feel like, we're basically lagging behind.
Ruiz: There is really not a platform right now for the entire thing of our ecosystem, in the ecosystem of observability and monitoring. You actually need to pick and choose. That means that we still don't have a package, like use all these technologies, this is a sensible configuration, this will provide the most seamless integration between tools. That doesn't exist. We're still building that. It's not like we're lacking that in the Java developer only, I think we're lacking that in the entire development world, but we're running towards that.
Losio: You closed your presentation with that amazing slide with the full ecosystem. In one sense it's amazing, what is out there. On the other side is, as a developer, where should I start? Because if I attend this presentation, I feel amazing, I want to do more. One request that I will immediately have is, how? You mentioned many different tools, is there actually a list or something I think you can probably then action on that?
Ruiz: It all depends. I go to an organization and see their technology stack, there are tools that make more sense because they are covered in a broader aspect. Right now, we don't have the pre-selected menus that will work on x-j set cases. We still don't have that. That's something at the foundations level we're trying to do. On one side is create the standards so the vendors implement those standards, and you can migrate from one tool to another as painless as possible. On the other hand, we don't have the synergy. If you have decided to use Jaeger instead of Zipkin, what is the difference? How difficult is that going to be? Envoy. I'm using Envoy for the proxying on microservices. Things like that. It is more about somebody actually trying them together and saying this has lower impedance in the communication. It's a trial and fail kind of thing.
Losio: We all probably want to have a magic solution.
I'd actually like to go back to the beginning of your presentation where you present the results of the survey from last year. I found that really interesting what you say about the stagnation more or less of the number. What do you see as the main reason for that? For many years, the message I got is if you do DevOps, things are getting better. The results are there, you see numbers are better. Your downtime is lower. Your production is faster. You go live quicker. Everything is great. Why are people not doing it more? Because they're afraid, because they're lazy, or because actually you reach a barrier that you do a bit and then you don't take the next step? What are the things that are slowing down the adoption?
Ruiz: There are several reasons, one, as I said, there is no platform, still. Then people try and sometimes they fail a little bit, or there is a little bit of complex connecting all these applications. I have gone into different organizations and asked, who is involved in the DevOps culture more actively inside the teams? Nobody says me. Even us as developers, we started with the wrong foot, because we were building the software, and suddenly, from the top, they came and said, "Now you also need to take into consideration security. Now you also need to take into consideration your build process. Instead of using it externally, now, it's going to be a pipeline inside your code, and now you're responsible for this." Then, when we're saying about security, then we start using these tools like OWASP, or any bot, Dependabot, or security is Snyk, JFrog, whatever. Now we have hundreds of vulnerabilities and warnings, and you're like, do you want me to fix everything? Is that the message? Now I need to learn more tools. Now I need to have an overview of the entire process. Now you're telling me that I have to deploy it into pods, containers. Do I need to know that now? Suddenly, now I need to worry about cgroups, about different users. It's crazy. Many developers said, "I'm a developer. I know the DevOps. I don't want to go into that trip." Having said that, the problem of stagnation is, we are overwhelming a lot of our developers. Sometimes we don't have the platform set in our organization. It's too bumpy already. On top of that, inside the teams, because DevOps requires a lot of inter-collaboration between teams, and people are not sure about what are their roles and their needs, when you start talking about that.
Losio: It's pretty hard to choose what you need and how to do it. As you mentioned, it's pretty hard to do the next step. Is there any blueprint, for example, for the ecosystem, like any initial blueprints?
Ruiz: I can point you to several resources about success cases, like what's the technology that we use to migrate thousands of services? Which, they are. It is not a green future for us. There are some really good examples of good migrations, or very good DevOps stories. I cannot tell you, this is like the golden recipe, not even like three different menus. Because I belong to the CDF foundation, that is something that we are working on. Because it's not only me as a developer who feels overwhelmed, it's a lot of developers. We still cannot suggest something that is sound, complete, and that is annoying.
Losio: That's the pain. I was thinking as well in that sense about vendor-neutral projects. That's the sense of the entire idea. Can I take some shortcuts? If I'm using basically one cloud provider, or if I migrate into the cloud, how bad is it if I try, for my DevOps journey, let's say vendor lock-in, or use services that maybe are not vendor neutral, but will make my cloud adoption faster or quicker, or maybe previous knowledge of the team. Totally against it, or you see a point for it?
Ruiz: I see a point for it. Actually, I will tell you something that increases the complexity a little bit. For example, if you are already doing microservices and using cloud, you may want to have different vendors for some of your critical services. It's not even like you are trying to avoid vendor lock-in. Another level of complexity is being aware of the configuration between different cloud providers. Not only that, checking that all your configurations are safe, or well configured, or equally configured even between the two. Probably, people say to me, as a developer, you don't need to do that. That's totally Ops. Ops should take that into consideration. There are some things that will actually have to be changed or modified or exposed in a different way, when we are building our software. That's actually one of the benefits to the entire organization, because we need more knowledge, probably not in-depth knowledge. We are not going to be the ones that have the deployment keys or deployment roles, but our knowledge on the actual challenges that they face, will make us reconsider some of our architectural decisions or implementation strategy.
Losio: There's something really that fascinated me when you talked about shift left and all the security parts of DevOps. I noticed that in the last few years, in general, there are more on the cloud provider, a tendency towards machine learning, artificial intelligence service that basically I'm thinking about, I'm going to find your code security vulnerabilities using some machine learning. I can see that maybe they have not matured yet, but that's the direction. Do you think that's going to take place? There's going to be an overlap with machine learning as well in this area on the DevOps side or not?
Ruiz: I think it will. Static code analysis totally will benefit from figuring patterns faster, and what we define as code smells, trying to reduce it. Even what some of our IDEs are doing, like, this is repeated code, do you want to extract it? Things like that, I think we will benefit in the long run. Things that are really clear that we may improve either because of complexity of the code, or easy refactory things. We are already benefiting from it, maybe not full blown, but we are. I think machine learning there will help us a lot. You were mentioning about GitHub Copilot.
Losio: I was thinking that as well. First, I don't see it as you write the entire code, but I can see the appeal of start to see some code there, starting as a base. I don't know how it's going to fit for a demo.
Ruiz: Actually, I think in some places, it will be very beneficial. In others, it will start conversations but more about responsibility, authorship. I think it's going to be a good influence.
See more presentations with transcripts