Codenvy’s Architecture, Part 2
In Part 1 of this article we discussed the Codenvy platform and basic architecture.
Also we discussed the differences between cloud and desktop IDEs, IDE workspace management, SDK, plug-in architecture and lifecycle, Codenvy.com architecture – user management and authentication, public vs. private projects, IDE collaboration, multi-tenancy- and others.
4.9 VIRTUAL FILE SYSTEM
In the Codenvy platform, a developer’s workspace is virtualized across different physical resources that are used to service different IDE functions.
Dependency management, builders, runners and code assistants can execute on different clusters of physical nodes.
In order to properly virtualize access to all of these resources, we needed to implement a VFS that underpinned the services and physical resources but also had a native understanding of IDE behavior.
4.9.1 Required Properties
A cloud IDE requires a storage system for its user’s projects. The VFS Codenvy uses has to have the following properties:
- It should have a client server architecture whose server side is accessible via HTTP (REST API) mainly using Ajax requests. This will allow for different IDE browser clients to gain access to project resources.
- Its API should not be bound to any one file or content management system. It should be flexible and abstract enough to implement any of them as backend storage relatively easily. For example, the API could be bound to a JCR system or a POSIX-based file system.
- It should be multi-root, meaning that once a user enters a domain or workspace, they access their filtered branch. This, by definition, implies that a single VFS can then have many perspectives (per domain, per workspace or per user), and the “global” VFS is the universal view. Multi-root capabilities offer the necessary foundation to apply Access Control Lists to a perspective, which is required for public/private project implementations.
- It should support standard CRUD operations for files and folders with both absolute and relative addressing by path and UID. By extending an AJAX system with CRUD operations addressed by UID, this opens up flexibility in allowing non-Codenvy clients and browsers to work with project-spaces directly.
- It should have other functionalities, such as access permissions, search (including full-text search), lock and versioning. These other capabilities should be included in a concrete implementation.
- It should contain another first-class resource type called Project, which adds special properties and functionality to the folder. In fact, we have extended our VFS to enable different folder nodes to be classified as different types of first-class nodes that subsequently inherit specific behaviors. This includes Project, Module and Package. It’s conceivable that this could be extended for other unique project characteristics such as Source, Libraries and so on.
Based on the experiences we have had with content management systems (e.g., eXo, JCR, xCMIS) and REST API (e.g., everREST) implementations, we decided against using HTTP-based transportation such as WebDAV or CMIS. These options were too complex and had some redundant data interchange, which made them a less than ideal solution. We defined our own VFS REST API to be IDE-specific.
We created different backend implementations during the development and evolution of our implementation, including:
- JCR: The file system is stored as a JCR item (Nodes and Properties). It is currently used in a Codenvy IDE OEM by eXo Platform. The JCR implementation offered native versioning of files accessible by the IDE but can make certain git operations, which are file intensive, very slow.
- POSIX: This is currently our production implementation. POSIX is a plain file system and is used on top of a GlusterFS-distributed file system.
- In-Memory: We authored an in-memory implementation to use in QA unit tests.
4.9.3 Virtual File System Entry Point (VFS Factory)
Logging into a user account linked to a workspace gives the user access to the VFS perspective associated with the workspace. After browser log-in, the browser is granted a token that can be used with the VFS API calls to gain access to the right VFS perspective. The browser can then make direct REST calls to the VFS.
An entry point to the VFS REST service has the following URL structure: http(s)://<host>/rest/ide/vfs/v2
Optional functionalities depend on the current VFS implementation, and may include an access control list (ACL) to control permissions, file versioning, locking, and querying (the ability to search).
4.9.4 Main Resources: Projects, Folders and Files
The VFS has a tree structure with the projects list at the top level of the workspace.
The tree structure can be expanded to reveal the folders, files and, in special cases, sub-projects (or modules) organized inside. This expansion is a different set of API calls, and we represent a project node visually different from a project’s sub structure.
There are three resource types:
- Files, which can be categorized differently and which have bodies with useful (indexable, searchable) content.
- Folders, the standard structure unit.
- Projects, a special type of folder with a set of properties that help identify that project’s nature, appropriate actions, views, etc.
The hierarchical organization of the three resource types manages the VFS’s structure in the following ways:
There are three overarching hierarchical rules governing the VFS’s structure: 1) only Projects are allowed to be the top-level resource (i.e., have workspace’s root folder as a parent); 2) projects may have Files, Folders or other Projects (for multi-module Project) as child resources, and 3) folders may have Files or Folders as child resources.
4.9.5 JSON-based Virtual File System
All file discovery, loading and access take place over a custom API. This API uses JSON to pass parameters back and forth between IDE clients. Compare the file access in a cloud IDE to that of a desktop IDE. In a desktop IDE, the application has local access to the disk drive and uses native commands to manipulate the files and defers to the operating system to provide critical functions around locking, seeking and other forms of access.
However, in a cloud environment, there are many IDEs operating simultaneously distributed across a number of physical nodes. The IDEs are coordinating using a set of code assistants, builders and runners that are also on distributed nodes. A workspace may be accessed by multiple developers simultaneously, running in different IDEs, also on different nodes. The role of a VFS, then, is not only to provide access to the files, but also to provide distributed, controlled access to the files.
By using a RESTful API with JSON, we are able to standardize the techniques used by different types of clients, whether those clients are running within our infrastructure or directly accessed by a browser. We needed to take the core operating system functions relating to file manipulation and access and package them up into this format.
4.9.6 Virtual File System Functionalities
Codenvy’s virtual file system API provides methods for navigating the resource tree step by step and accessing a particular resource directly by using its unique identifier (UID) or Path. You can access data via its root Folder.
The virtual file system has been structured to allow for queries, search, and sort mechanisms of projects through HTML applications that are not within the IDE. This would allow developers to create their own applications that interact directly with Codenvy workspaces living on the VFS. Creating a query statement using the POST request’s parameters is implementation-specific.
The virtual file system may also support a number of functionalities, including observation, access control, and file content versioning. Each of these functionalities is implementation-specific.
4.10 LOGGING, ANALYTIC, and DASHBOARDS
We have the following use cases for data and events collected within the system.
- Measure acquisition, engagement, and virality of users to improve experience, adoption, and customer satisfaction.
- Provide insights into user, employee, and administrator behavior for management tracking.
- Derive and generate insights that can eventually be incorporated back into the product to enable users to be more productive. For premium, enterprise and ISV customers, generate reports and insights that are contracted as part of their purchases.
- Open up the data collected of the system to external audiences for querying, investigation, and research.
(Click on the image to enlarge it)
Since we needed to create a solution that works equally well on-premises as it did in the cloud, we had to rule out using some compelling services like log.ly for our usage. We needed a system that could be OEM’d and operating in different cloud environments.
We have instrumented all of the major client and server side events. These are the typical events like login and logout. But it’s also developer first-class events like “refactor”, “build”, “debug”, “step into”, “preview”, and “export”. These events get logged as file-based messages that exist on each physical node. Messages are placed in long term storage and there are archival policies that rotate their location as they age. We use pig/hive to write programmatic queries to process the messages and to generate derived metrics. Both messages and derived metrics are stored in Hadoop. We chose Hadoop for this solution as there are numerous analytics, reporting, and dashboarding solutions we may choose to bolt-on that offer a premium experience in browsing, analyzing, and correlating data.
Today, we operate three services off of the analyzed data. There is an administrative dashboard that is a simple Web site that shows analysis of users, adoption, engagement, and virality. There are also a number of generated reports that are produced as CSV, Excel, and PDF files for management that are generated on a recurring or parameterized basis. Finally, there is a RESTful API that grants query access to the data and metrics. This API will eventually be packaged and exposed to developers for their direct consumption.
4.11 HOSTED APIS
Every server-side service within Codenvy, such as refactor, authenticate, build, and deploy, is built and exposed as a RESTful Web Service. These Web Services are accessible by:
- The internal Shell, which has a set of pre-defined Groovy commands for invoking Web Service calls with parameterized data, or:
- An external program, which registers a key and accesses the Web Services directly itself.
The list of Web Services is published within the product itself. You can go to “Help -> REST Services Discovery” for the complete list along with information on what parameters are acceptable. Here is a snippet of the type of functions that are accessible over URL and how the parameters are structured.
|Service Path||Method||Consumes||Produces||Short Description||Roles|
|/organization/users||POST||application/json||application/json||Create a user||cloud/admin|
|/organization/users/(id)||GET||-||application/json||Find a user
|/organization/users?alias=alias||GET||-||application/json||Find a user
|/organization/users/(id)||POST||application/json||application/json||Update an existing user||cloud/admin,
|/organization/users/(id)/remove||POST||-||-||Remove a user||cloud/admin,
|/organization/users/authenticate||POST||application/json||-||Authenticate a user||-|
4.12 SHELL TECHNOLOGY
Our current shell is a browser layer that accesses internal Web Services. We do not provide a direct bash shell to users, yet. We will, in the near future, offer a SSH session into a dedicated VM that is mapped to a private builder / runner queue. This SSH session into a dedicated VM will come with bash capabilities. In many ways, the non-SSH in-IDE shell is a virtual shell that is an abstraction on top of our Web Services. We use the open source EverREST project which is an implementation of JAX-RS to make these invocations simpler. Read more about EverREST here.
4.13 IDE, BUILDER, and RUNNER CLUSTER MANAGEMENT
The connections between an IDE and a builder / runner are handled through Web Sockets. There is a BuilderManager and RunnerManager service that run within our server-side infrastructure controlling and routing requests from various IDE clients to the appropriate service. The client context (i.e., paid vs. unpaid) determines the type of queue that a request is mapped into. Different queues have different processing policies, such as dedicated VMs, shared VMs, and VMs with SSH access.
The number of physical nodes used to operate the IDE cluster, builder cluster and runner cluster are determined by HAProxy. We have configurable criteria that determine when an expansion or a contraction of nodes in each cluster should occur. With the IDE cluster, it’s based upon the number of active tenants. If the threshold is 300 active tenants, there is a pre-impact threshold that will trigger the creation of a new physical node when reached. The same process happens on the decline as well. In a situation where we need to shut down a physical node, we do not do any sort of migration of tenants from one node to another. The next time the tenant connects to the services, it will be re-activated in another node that is currently online, and service state will be restored. The same process happens for builders and runners; however, the configuration criteria are unique to the behaviors of those systems. In the case of builders, we optimize the number of builders to attempt to eliminate any blocking behavior. And the same happens for runners.
4.14 CLOUD CONNECTIVITY SERVICES
For any IDE function that connects to an external cloud service such as GitHub, Google App Engine, or a continuous integration server, we always make use of the direct API provided by the 3rd party provider. We control all of the communications with a proxy that lives within our cloud infrastructure. The proxy can handle both outbound and inbound API calls from all third party systems, and also allows us configurability on their performance and operations.
4.15 PULLING IT ALL TOGETHER
Logically, this is how the system operates at full scale, when operating on multiple nodes simultaneously.
(Click on the image to enlarge it)
- The client's browser loads Codenvy Web site and makes some URL-based request. There are two types of requests: regular (business logic) and meta.
- If the request is regular, the HAProxy load balancer decides where to route it.
- If the request is meta, the cloud administration node performs special actions, such as tenant creation and removal.
- While performing a meta request, the cloud administration node may update the configuration of HAProxy giving it future instructions on how this particular tenant should be handled.
- A business request is routed to an IDE that is dynamically deployed on an application server.
- While performing meta request, the cloud admin uses internal REST requests to a cloud agent located on an application executing a pool of IDEs. The cloud admin node can also instruct the system to add or remove additional application server nodes for IDEs, builder, or runners according to scalability rules.
- Certain meta requests (i.e., authentication) call the organization DB stored within an LDAP server. The LDAP server contains account, organization, workspace, and user information and profiles.
- IDE calls Organization DB to gain access to account information, premium / free status, and other gamification metrics.
- The IDE calls internal services such as Builders, Runners and Code Assistants.
- The IDE uses the virtual file system interface to gain access to projects, code, and repository information.
- The IDE calls external PaaS, storage, continuous integration, and VCS through a RESTful API.
- Statistics storage retrieves logs from the applications for further analysis. The IDE accesses stored statistics to provide statistics back to users and make productivity recommendations.
- A request to the cloud admin console also triggers event logging within our analytics system, first stored as a file to the file system, and then eventually into Hadoop.
- There are CLI, programmatic APIs, and a GUI client for administering the cloud admin node.
- A manager gains access to dashboards, reports, and statistical data through a dedicated GUI client that accesses Hadoop and other analytics systems.
5.0 RELEASE MODEL
5.1 ENVIRONMENT STRUCTURE
We have a number of different environments that are used to service different requirements that are part of our process. These environments include:
Production environment with full SLAs, support, and built-in elasticity.
Runs a complete Codenvy environment with a selection of finished features / issues that are targeted for production deployment. These features may be implemented across a group of sprints. This environment is for validation with customers, documentation to create their examples, support to get trained on upcoming features, and for marketing to create assets off of the new version.
A simulated production environment that can be updated with issues form a specific sprint. This environment is used for scalability testing, hot fix acceptance, and for configuring connections to 3rd party services within a public cloud environment.
To execute specific features completed by development and waiting from acceptance from a product owner. We operate up to 5 different acceptance environments. Acceptance environments can be automatically created through our continuous integration system. Acceptance environments are single-server and have no elasticity.
5.2 SCRUM PROCESS
As a company that creates tools and solutions for developers, we spend a lot of time talking with our customers about how they manage their development projects. We make it a priority to share Codenvy’s internal processes and technologies with our customers. This helps us facilitate discussions around best practices, helps improve our internal processes, and gives us further insight into the complex problems developers face.
For about five years now, we’ve used Scrum as our main process for driving development. Scrum is an iterative and incremental development (IID) methodology that helps guide and shape our daily work. It’s a flexible approach that helps our developers concentrate on developing, while still providing the discipline that keeps everything running on time.
Using Scrum gives us the iterative planning, specification, implementation, test and delivery capabilities we need to drive our rapid product evolution.
Our Product Manager acts as the Project Owner (PO) of the Sprint. Having a non-developer as PO lets our developers focus on programming instead of the nitty-gritty of project management. That way, the Product Release process stays separate from the Development process.
We use JIRA with the Greenhopper plugin as our project and issue tracking tools.
Development work is done in 2-4 week “Sprints” to break releases into manageable segments. During that time period, we focus on turning specific features of the latest roadmap into final tested code.
The sprint’s fixed duration is based both on the amount of work and the team’s current capacity to get it done. These durations can be flexible, but only change after review and discussion of how it will affect the Sprint’s goals.
Before a Sprint begins, our Development team examines the features presented by the PO from every angle and creates detailed “Sprint Briefs” outlining exact specifications. They show these briefs to the PO to make sure everyone is in agreement over what’s expected.
Once our team and PO have agreed on the project specifications, the PO then builds the Sprint Backlog. As part of this, the team agrees to a “Feature Definition of Done” – that is, how we demonstrate the feature and what criteria it must meet in order to be “completed.”
During the Sprint, our development team holds daily Stand-up meetings. Team members discuss the state of current work and any issues or obstacles that occur. We keep the meetings short (~15 minutes), so they don’t take away too much programming time, and they help everyone stay on the same page.
At the end of the Sprint, our developers demonstrate the Features to the PO. During the demonstration, our PO looks for the specific features outlined in the “sprint briefs.” This acceptance work happens on specialized instances of Codenvy running in our development environments where each instance allows for validation of a single issue.
Once everything’s been approved, our team and PO analyze the results in a Retrospective and propose areas of possible improvement for the next iteration. After a sprint completes, all of the issues that were catalogued in a sprint are moved into a pre-production and staging environment. There, marketing, support, and documentation can perform additional functions before releasing into production.
The combination of regulation and freedom we get from Scrum is really the best of both worlds. Scrum gives our developers the flexibility they need to create high-quality solutions without sacrificing the discipline of deadlines.
5.3 AUTOMATED TESTING
We are close to 95% automated test coverage of the system. We use Selenium for any IDE and site functionality. We use JUnit for server-side testing along with testing functional methods in GWT-based plug-ins. We do daily manual testing on the use cases and features we have not been able to automate, such as integration with PaaS vendors where the connection speed can alter the result received. All of our tests are launched automatically as part of our continuous integration procedure and run at least once per day after a check-in.
5.4 DEVOPS TECHNOLOGY
We use Amazon and Eucalyptus-made images for configuration and deployment if deploying into a cloud environment. If deploying onto a PC, we use self-made RPM packages and bash scripts. The configuration environment is further managed through Puppet orchestrations.
We use a very simple monit, logwatch and cacti implementation to monitor all systems for availability.
About the Author
Tyler Jewell is CEO of Codenvy and a venture partner with Toba Capital where he focuses on developer investments. He sits on the board of Sauce Labs, WSO2, Exo Platform and Codenvy along with making investments in Cloudant, ZeroTurnaround, InfoQ, and AppHarbor.
Ruslan Meshenberg Sep 21, 2014