Key Takeaways
- Enhancing developer experience on AWS can be achieved by leveraging generative AI tools like Amazon Bedrock, code review assistants, and agentic code generation, which streamline development processes, improve code quality, and increase productivity.
- By integrating tools like Amazon Bedrock and using webhooks, teams can create systems to query code repositories for explanations, reducing the time spent on manual explanations.
- To gain acceptance for generative AI tools, they must demonstrate tangible benefits like efficiency gains, showcase successful use cases, and involve team members in the process for a smoother transition.
- AI-driven code review tools can identify potential errors, suggest improvements, and provide valuable insights, leading to higher-quality code.
- Organizations should focus on security, cost, and human oversight to maximize the benefits of generative AI while regularly checking and improving how it’s used.
This is a summary of a talk I gave at InfoQ Dev Summit Munich 2024. I discussed the transformative potential of generative AI in enhancing developer experiences, particularly through Amazon Web Services (AWS).
I’ll introduce key tools like Amazon Bedrock, Code Review Assistant, Agentic Code Generation, and Code Summarization in this article.
Introducing Amazon Bedrock
Let’s start with Amazon Bedrock. What is it? Bedrock is a fully managed service that provides access to foundation models from leading companies like AI21 and Anthropic. It comes with tools like Bedrock Studio and Knowledge Bases, making it easier to integrate AI capabilities into your applications.
While Bedrock is a fantastic platform, what truly excites me is how these capabilities impact our day-to-day work as developers.
Code Review Assistant with Amazon Bedrock
Let’s talk about code reviews. Be honest: who actually enjoys them? If you do, you’re probably from Mars. Code reviews can be a frustrating bottleneck for the rest of us Earthlings.
Here’s the typical workflow:
- Coding: On average, developers spend 19 hours writing code.
- Pickup Time: It takes about nine hours for someone to review the code.
- Review Duration: The review process often drags on for five days because, let’s face it, no one picks it up immediately.
- Deployment Time: Once the review is done, deploying takes an additional 80 minutes on average.
The result? A lot of valuable work gets stuck in the review phase. This is disheartening because, as engineers, our ultimate satisfaction comes from seeing the solutions we build get into customers’ hands.
One day, I was fed up with the inefficiency of code reviews. My team was equally frustrated, so I decided to take action. I asked myself, How can I make this process better? I came up with a simple yet effective architecture to address the challenges of code reviews. By leveraging Amazon Bedrock, we integrated AI-driven tools into our workflow.
Here’s how it works:
- Third-Party Git Integration: We started with a third-party Git repository like GitHub.
- AI-Powered Reviews: Using Bedrock’s foundation models, we automated parts of the code review process, such as identifying errors, suggesting improvements, and summarizing changes.
- Streamlined Feedback Loops: This drastically reduced the time engineers spent waiting for reviews, enabling faster deployment of solutions.
For me, the key takeaway is that developer experience matters. Tools like Amazon Bedrock aren’t just about flashy features; they’re about making our work more meaningful and efficient. Whether it’s automating code reviews or building applications on the fly, generative AI has the potential to revolutionize how we work.
Interestingly, I discovered similar solutions are already available in the GitHub Marketplace after building this. Nonetheless, the concept is simple and adaptable. Depending on your needs, you can extend it within your organization, substituting Bedrock with alternatives like Mistral or Claude code assistants.
Have you ever had to migrate from Java 8 to newer versions like Java 17 or 21? It’s a daunting task many engineers dread. Tools like Amazon Q Developer simplify such migrations, having completed over 1,000 migrations successfully. However, provisioning Amazon Q Developer was cumbersome for my team due to its AWS SSO dependency.
Instead, we turned to GitHub Copilot. Here’s what I observed:
- Engineers using Copilot submitted pull requests faster, showing a 15% to 30% efficiency gain.
- Copilot didn’t just help write code - it transformed engineers into better problem solvers, enabling them to focus on logic rather than syntax.
The narrative that AI will replace engineers doesn’t resonate with me. Instead, AI empowers us to evolve as problem solvers and craftspersons.
Code Base Summarization with Amazon Bedrock
Onboarding new engineers can be tedious, especially when explaining legacy code. I faced this challenge firsthand when a former team asked me to explain a data platform I built years ago. Instead of diving into old diagrams and notes, I leveraged code summarization tools.
Here’s the architecture I recommend:
- Code Repository Storage: Store repositories securely in an S3 bucket with no public access.
- Amazon Knowledge Base: Use OpenSearch (either serverless or instance-based) to index the code.
- Foundation Models: Integrate Bedrock or GitHub Copilot to allow new engineers to query and understand the codebase.
By automating code explanation and documentation updates, onboarding becomes smoother. For instance, you can configure a webhook to trigger updates to documentation whenever code is pushed to the repository. This way, documentation remains aligned with the latest code, reducing manual effort and improving accuracy.
From automating code reviews to enabling quick project setups and simplifying onboarding, generative AI tools like Bedrock and Copilot are revolutionizing the way we work. These technologies aren’t about replacing engineers; they’re about making us more efficient, creative, and effective problem solvers.
If you haven’t yet explored these tools, now is the time. Start small, integrate them into your workflows, and watch how they elevate your team’s productivity and innovation.
Support Case Investigation
In a B2B environment, resolving production issues efficiently is critical. Often, infrastructure monitoring tools don’t immediately identify customer-reported issues, leading to frustration across teams.
A streamlined solution involves leveraging webhooks, APIs, and LLMs for rapid data synthesis:
- Trigger a Webhook: A webhook is triggered upon case creation in the case management system.
- Invoke a Lambda Function: The webhook activates a Lambda function, which queries log systems (CloudWatch, Prometheus with OpenTelemetry, or databases with read-only access).
- Synthesize Data: LLMs analyze unstructured data, summarizing findings.
- Return Insights: The processed insights are returned to the case management system, reducing resolution times from days to minutes.
Example Use Case:
An HR team collected 240 survey responses in Excel, requiring hours of manual review. By using an LLM to process and map responses to onboarding processes, analysis time dropped from 7 hours to 5 minutes. This underscores the potential of LLMs to enhance developer workflows and cross-departmental tasks.
Refactoring with Generative AI
While GenAI can offer a head start in tasks like upgrading protocols or refactoring services, it’s not flawless. For example, introducing a REST interface to replace a gRPC protocol led to significant errors when attempted with GenAI. This highlights the importance of human oversight in critical codebase changes.
Strategies for Scaling Refactoring:
- Manual Initial Refactor: Make the initial changes manually.
- Leverage AI for Repetition: Instruct the AI to replicate the changes across repositories once the approach is validated.
- Model Training: Tools like Amazon Q Developer allow model customization with proprietary code, though security compliance can be a limiting factor.
AI Code Review
AI-powered code reviews offer targeted, actionable feedback on specific lines of code, aiming to optimize runtime and simplify logic. Advanced models with large token capacities (e.g., 200,000 tokens) enable analysis of entire repositories, providing deeper context for human-readable, structured suggestions and context-aware insights into the application. However, these reviews may struggle with the complexities of interconnected codebases without sufficient context, necessitating supplemental human oversight to ensure comprehensive understanding and quality.
A/B Testing for Integration Efficiency
The benefits of automation become apparent through experimentation with teams using and not using AI-driven integrations. Teams using GenAI for pull requests saw measurable efficiency gains, easily handling four to five pull requests daily. These real-world examples serve as proof points to drive broader adoption.
Cost Considerations for Token Usage
Managing token costs is critical when working with LLMs and embeddings. While small-scale experiments may not immediately show up in expenses, larger-scale usage could impact budgets. To avoid surprises:
- Set Cost Alerts: Tools like AWS budgets can notify teams if token usage exceeds thresholds.
- Monitor Usage Trends: Regularly review logs for spikes in usage or inefficiencies.
Do’s and Don’ts When Starting with AWS Bedrock
Do’s
- Security and Compliance First: Always validate usage policies with security teams before handling proprietary data.
- Start Simple: Utilize Bedrock’s UI for initial experiments with public or non-sensitive data.
- Quick Validation: Get a functional prototype running as soon as possible, even if it’s just a chatbot or basic query system.
- Engage Cross-Functional Teams: Demonstrate ease of use to non-technical team members to widen adoption.
Don’ts
- Avoid High-Stakes Data Early: Avoid exposing sensitive information until secure practices are confirmed.
- Don’t Overcomplicate: Focus on solving specific problems rather than building overly complex systems upfront.
Change Management for GenAI Adoption
Convincing experienced engineers to adopt new tools can be challenging, particularly with long-standing projects and legacy codebases, due to concerns such as the fear of large-scale, automatic changes disrupting stability and doubts about the reliability of machine-generated code. If you want to gain buy-in, it’s essential to demonstrate value by showcasing real-world examples of efficiency gains and improved value delivery to customers. Leading by example, such as demonstrating successful use within your team and sharing metrics from teams already leveraging GenAI effectively, can further help. Additionally, positioning GenAI adoption as an opportunity for evolution rather than a replacement reassures engineers that it enhances their problem-solving abilities without threatening job security.
GenAI for Product Owners and Refinement Sessions
Refinement meetings often consume significant time without clear outcomes, but GenAI can streamline this process by generating user stories, providing a structured starting point for discussions, and drafting acceptance criteria, which helps product owners prepare better for sessions and reduces ambiguity. This results in enhanced preparedness, more focused sessions with fewer redundant discussions, and accelerated refinement, as teams can quickly converge on actionable user stories.
Test Cases Generation with GenAI
Generative AI can streamline testing processes by producing test cases based on clear expectations or policies. This works by using clear prompts based on system requirements and expected outcomes to generate test cases through models like Bedrock, followed by evaluating and refining them. However, challenges include the need for human oversight, as the quality of generated test cases can vary significantly, and the AI’s reliance on well-defined inputs, meaning ambiguous prompts can lead to irrelevant or flawed outputs. While GenAI accelerates test case creation and reduces QA workload, it cannot entirely replace human judgment, making collaboration essential.
Handling Deeply Nested Repositories with GenAI
Repositories often have complex structures, which can be challenging for AI tools processing code or documentation. To address this, the repository contents are uploaded to S3, with irrelevant files or folders excluded during preprocessing. Retrieval-Augmented Generation (RAG) is then used to convert documents into vector representations stored in OpenSearch. Hierarchical structuring, such as graph-based RAG, can be added to establish relationships between nodes for more nuanced searches. By initially bypassing hierarchical complexities, the setup and experimentation process is accelerated, with refinements made based on specific needs.
Measuring LLM Accuracy
Evaluating the performance of Large Language Models (LLMs) is more complex than classical machine learning models, requiring qualitative and quantitative measures. Qualitative feedback can be gathered using a thumbs-up/thumbs-down system and tracking relevance scores for tasks like code review or query accuracy. Quantitatively, for example, in a pull request use case, only 60% of responses were relevant due to incomplete context, such as missing associated classes in code snippets. Providing full context for tasks and collecting user feedback to refine the system are essential strategies to improve performance.
Unit Test Generation with GenAI
Generative AI tools like GitHub Copilot are increasingly used to automate the creation of unit tests by scripting them based on existing code, allowing developers to refine them to meet specific requirements. This automation helps new team members quickly contribute by bridging the gap between unfamiliar codebases and meaningful work, and it also boosts efficiency, with reported improvements of 15% to 30% by reducing manual test-writing effort. However, the generated tests often require human review for quality and relevance, and without comprehensive context, such as detailed specifications or behavioral expectations, they may lack completeness.
Unit Test Coverage with GenAI
Unit test coverage measures the percentage of code tested, and GenAI has proven useful in improving this metric. For example, a team using GitHub Copilot increased their coverage from 0% to around 50%, with a target of 70%, which remains challenging but shows significant progress through automation. To optimize coverage, teams focus on high-impact areas like critical code paths or API endpoints, and gradual coverage improvements can enhance the code’s reliability and maintainability over time.
Conclusions
The integration of generative AI capabilities, particularly through tools like Amazon Bedrock and Amazon Q Developer, has the potential to significantly enhance the developer experience. By automating time-consuming tasks such as code reviews, support case investigations, and test generation, engineers can focus more on problem-solving and delivering value to customers. Using large language models not only streamlines processes but also aids in onboarding new team members through code explanation and summarization. However, it is essential to maintain human oversight to ensure the accuracy and relevance of AI-generated outputs. As organizations continue to embrace these technologies, the journey involves change management and demonstrating the tangible benefits of generative AI, ultimately leading to a more efficient and effective engineering workforce.