The Engineering Manager's First 90 Days with AI Coding Tools
A week-by-week guide — tool selection, pilot group setup, measurement framework, and full rollout in 90 days.
You have been asked to roll out AI coding tools to your engineering team. Maybe the directive came from your VP. Maybe it came from the developers themselves. Either way, you own the outcome. And the outcome depends almost entirely on what happens in the first 90 days.
Most rollouts fail not because the tools are bad, but because the process is bad. A team gets licenses, a Slack message goes out that says “AI tools are available,” and then nothing structured happens. Three months later, adoption is uneven, measurement is nonexistent, and leadership is asking whether the investment was worth it. You cannot answer because you never defined what “worth it” means.
This guide gives you a week-by-week plan. It is opinionated. It is tactical. It assumes you are managing a team of 10-50 engineers and that you have budget approval but not infinite patience from stakeholders. For the broader strategic context, see the team rollout guide.
Weeks 1-2: Assess and Prepare
The first two weeks are about understanding where you are starting from and defining where you want to end up. You do not touch tooling yet. You do not give anyone access. You prepare.
Week 1: Current State Assessment
Start by answering three questions honestly:
Where does your team spend time today? Look at your project management data. What percentage of engineering time goes to new feature development versus bug fixes, maintenance, testing, and documentation? You need this baseline. Without it, you cannot measure whether AI tools changed anything.
What does your existing workflow look like? Map the path from “developer picks up a ticket” to “code is in production.” Every step. IDE, version control, CI/CD, code review, deployment. AI tools insert themselves into this workflow. If you do not understand the workflow, you cannot predict where the insertion points are.
What is your team’s appetite? Talk to your engineers. Not in a group meeting — one-on-one. Some are already using AI tools on personal projects and are eager. Some are skeptical. Some are anxious about job security. You need to know the distribution. It shapes how you structure the pilot.
Week 2: Define Success and Select Tools
Define success metrics before you select tools. This order matters. If you select tools first, your metrics will be shaped by whatever the vendor dashboard reports. Define what success means for your team, then find tools that let you measure it.
These dimensions align with the DORA metrics framework for measuring software delivery performance. Your success metrics should cover four dimensions:
- Adoption. What percentage of licensed developers are actively using the tools weekly?
- Efficiency. Are AI-assisted tasks completing faster than equivalent tasks without AI?
- Quality. Is the defect rate for AI-assisted code comparable to or better than manually written code?
- Satisfaction. Do developers find the tools useful? Would they choose to keep using them?
With metrics defined, evaluate tools against your requirements. Consider: language support for your stack, IDE integration with your toolchain, data handling policies that satisfy your security team, pricing model that aligns with your budget, and the ability to measure the metrics you defined. The best toolsets guide covers evaluation criteria in detail.
Do not select more than two tools for the pilot. Comparing twelve options in production is not an evaluation — it is chaos.
Weeks 3-4: Set Up the Pilot
The pilot is where most rollouts succeed or fail. A well-structured pilot gives you data. A poorly structured pilot gives you anecdotes.
Week 3: Select the Pilot Group
Choose 3-5 developers for your pilot. Selection criteria matter:
- Mix of experience levels. Include at least one senior developer and one mid-level developer. Their experiences will differ, and you need both data points.
- Mix of enthusiasm levels. Do not stack the pilot with AI advocates. Include at least one skeptic. If the tools win over the skeptic, you have a powerful signal. If they do not, you learn something important about the tools’ limitations.
- Representative work. Pilot participants should be working on typical tasks for your team — not a greenfield project, not a legacy migration, not a special initiative. You want data that generalizes.
- Willingness to give feedback. Pilot participants are your data source. They need to be willing to share what works, what does not, and what surprised them. Select people who communicate.
Week 4: Configure and Onboard
Set up the tools for your pilot group. This means:
Technical setup. Install IDE extensions, configure API access, set up any required security controls. Test that everything works end-to-end before handing it to developers. Nothing kills momentum like a broken setup experience.
Baseline capture. Record each pilot participant’s current metrics. Tickets completed per sprint. Average time from ticket start to PR submission. Defect rate in their recent PRs. You will compare against these baselines in eight weeks.
Onboarding session. Run a 60-minute hands-on session. Not a slide deck. Not a vendor demo. A working session where developers use the tools on real code from your codebase. Cover: basic usage patterns, effective prompting techniques, when to use AI versus when not to, and your team’s guidelines for AI-generated code review.
Feedback mechanism. Set up a lightweight way for pilot participants to share observations. A shared document, a dedicated Slack channel, or a weekly 15-minute standup works. The mechanism matters less than the consistency. You need data every week.
If you want a deeper playbook for structuring pilot programs, see the pilot program guide.
Weeks 5-8: Run the Pilot
Four weeks of active use with structured data collection. This is the core of your 90-day plan.
Weekly Rhythm
Every week during the pilot, you do three things:
Collect quantitative data. Track: sessions per developer per day, tokens consumed, acceptance rate, time spent in AI-assisted sessions, and any quality metrics your tools provide. Build a simple spreadsheet or dashboard. Do not over-engineer this. You need trends, not perfection.
Collect qualitative data. In your weekly check-in, ask each pilot participant: What worked well this week? What was frustrating? Did anything surprise you? What task type benefited most? What task type did not benefit at all? Record their answers verbatim.
Observe patterns. By week 6, you will see patterns emerge. Some developers will be using the tools heavily for specific task types. Others will have found workflows where AI slows them down. One developer will have discovered a use case nobody anticipated. These patterns are your most valuable data.
Week 5: The Learning Curve
Expect a dip. Developers will be slower in the first week as they learn to prompt effectively, understand the tool’s strengths and weaknesses, and integrate AI into their existing workflow. This is normal. If a pilot participant reports that the tool “does not work,” dig deeper. The issue is almost always prompt quality or mismatched expectations, not tool capability.
Week 6: Stabilization
By the second week, most developers have found their rhythm. Prompting improves. Acceptance rates stabilize. Developers start skipping the tool for tasks where it does not help and leaning on it for tasks where it does. This self-selection is healthy. It means the tool is being integrated, not forced.
Week 7: Optimization
The third week is where power users emerge. Someone will figure out a workflow that saves significant time — generating test suites, scaffolding boilerplate, writing documentation. Share these wins across the pilot group. Peer learning is more effective than formal training, a finding consistent with research on situated learning and communities of practice in knowledge work.
Week 8: Assessment Preparation
In the final pilot week, shift focus from usage to analysis. Ask pilot participants to rate the tools on a simple scale. Compile your quantitative data into trends. Identify the three strongest use cases and the three weakest. You need this for the expansion decision.
For the full measurement framework, see how to measure AI adoption.
Track these metrics automatically with LobsterOne
Get Started FreeWeeks 9-10: Analyze and Decide
Two weeks to turn data into decisions. This is the phase where most rollouts lose momentum. Do not let it happen.
Week 9: Data Analysis
Compile your findings across three categories:
Productivity impact. Compare pilot participants’ output metrics against their baselines and against non-pilot team members during the same period. Be honest. If the impact is marginal, say so. If it is significant for some task types and negligible for others, report that nuance.
Quality impact. Compare defect rates, code review feedback, and any automated quality metrics. Did AI-generated code introduce more issues? Fewer? Different kinds? This data shapes your review process for the broader rollout.
Developer experience impact. Aggregate your qualitative data. What themes emerge? If three out of five developers independently say the same thing — positive or negative — that is a signal. If one developer had a dramatically different experience, understand why.
Week 10: Expansion Planning
Based on your analysis, decide one of three things:
- Expand. The pilot showed clear positive signal across enough dimensions. Proceed to broader rollout.
- Extend. The data is promising but inconclusive. Run the pilot for another four weeks with adjusted parameters.
- Pause. The data does not support expansion. Document what you learned and revisit in three months.
If you decide to expand, plan the rollout now. Define which teams get access first. Identify who will lead onboarding for each team. Determine whether you need additional tooling for measurement at scale. Set a target adoption rate for week 13.
Red Flags to Watch For
Some signals during analysis should give you pause:
- High adoption, low quality. Developers are using the tools extensively but defect rates have increased. This means code review processes are not catching AI-generated issues.
- Polarized experience. Two developers love it, three hate it. This is not “mixed results.” It is a signal that the tools work for certain developer profiles or task types. Understand the segmentation before expanding.
- No measurable impact. Developers report liking the tools, but no quantitative metric moved. This could mean the tools are helpful but your metrics do not capture how. Or it could mean the tools provide a perception of productivity without the substance.
- Security concerns. If the pilot surfaced security issues in AI-generated code — hardcoded secrets, missing validation, overly permissive configurations — your expansion plan must include updated review processes.
Weeks 11-13: Broader Rollout
Three weeks to go from pilot to team-wide adoption. Move fast enough to maintain momentum. Slow enough to maintain quality.
Week 11: Phased Expansion
Do not give the entire team access on Monday morning. Expand in waves. If you have 40 engineers, add 10-15 in week 11. Use the same onboarding process you refined during the pilot: hands-on session, baseline capture, feedback mechanism.
Deploy your pilot participants as ambassadors. Each one takes informal ownership of 3-4 new users. They answer questions, share prompting tips, and report back on issues. This peer support model scales better than centralized training.
Week 12: Monitor and Adjust
The second wave will surface different issues than the pilot. Different codebases, different languages, different task types. Expect it. Your weekly data collection should now cover the full expanded group.
Watch for:
- Adoption clusters. Some sub-teams will adopt faster than others. Understand why. Is it the team lead’s enthusiasm? The codebase’s suitability? The task type distribution?
- New use cases. A larger group discovers more ways to use the tools. Capture and share these.
- Infrastructure strain. More users means more API calls, more tokens, more cost. Verify that your budget projections hold.
Week 13: Establish Ongoing Measurement
By the end of week 13, you should have:
- Adoption baseline. Team-wide weekly active usage rate. This is your ongoing benchmark.
- Efficiency metrics. Task completion time and throughput data that you update monthly.
- Quality metrics. Defect rate tracking for AI-assisted versus non-assisted code.
- Cost metrics. Total monthly cost, cost per developer, and cost per session.
- Review cadence. A monthly check-in where you review metrics, share learnings, and adjust guidelines.
Document everything. Your successor, your VP, and your audit team will all ask how you rolled out AI tools. The answer should be: systematically, with data, over 90 days. For benchmarking your numbers against industry standards, see the 2026 adoption benchmarks.
The Common Mistakes
After watching teams go through this process, a few mistakes recur:
Skipping the baseline. If you do not measure where you started, you cannot prove where you ended. Collect baselines before you give anyone access.
Selecting only enthusiasts for the pilot. A pilot of true believers produces great testimonials and useless data. Include skeptics. Their feedback is more valuable.
Measuring only adoption. As the Thoughtworks Technology Radar has noted, adoption without quality and efficiency data is vanity metrics. “80% of the team is using AI tools” is a meaningless claim if you cannot say whether it is helping.
Expanding too fast. A successful pilot with five developers does not mean the tools work for fifty. Each wave of expansion surfaces new issues. Leave time to address them.
No ongoing measurement. The 90-day plan is a beginning, not an end. Teams that stop measuring after rollout lose visibility into whether the tools continue delivering value as novelty fades and work changes.
The Takeaway
Ninety days is enough time to go from “we should try AI tools” to “we have data on what works, what does not, and why.” It is not enough time to optimize everything. It is enough time to make an informed decision about scaling, to establish measurement infrastructure, and to build the organizational muscle for ongoing adoption management.
The engineering managers who succeed with AI tool rollouts are not the ones who pick the best tools. They are the ones who run the best process. Define metrics before selecting tools. Structure the pilot for data, not enthusiasm. Analyze honestly. Expand deliberately. Measure continuously. The tools will improve over time. Your process determines whether your team benefits from those improvements or misses them entirely.

Pierre Sauvignon
Founder
Founder of LobsterOne. Building tools that make AI-assisted development visible, measurable, and fun.
Related Articles

How to Roll Out AI Coding Tools Across Your Engineering Team
A phased playbook for engineering leaders deploying AI coding tools — from pilot group to full adoption, with change management and measurement built in.

How to Measure AI Adoption in Engineering Teams
What to track when your team uses AI coding tools — tokens, cost, acceptance rate, sessions — and how to build a measurement practice that drives decisions.

AI Coding Adoption Benchmarks: What Good Looks Like in 2026
A framework for setting realistic AI adoption targets by team size, domain, and maturity — not vanity metrics, but actionable benchmarks.