
Lead Software Engineer II, AI Operations
Who we are
Best Egg, now part of Barclays, is a market-leading, tech-enabled financial platform helping people build financial confidence through innovative lending solutions and financial health tools. As a Barclays company, we combine the agility and customer focus of a fintech with the global reach, stability, and purpose of a leading financial institution—working together to create a better financial future for our customers and communities.
At Best Egg, you’ll find a culture grounded in our core values—putting people first, creating clarity, delivering with excellence —enhanced by Barclays’ commitment to integrity, inclusion, and long-term impact. Together, we empower our colleagues to challenge, innovate, and take ownership while making a meaningful difference in people’s financial lives.
With the strength of Barclays behind us, we offer expanded opportunities for growth, development, and career mobility across a global organization—while continuing to build the products and experiences that make Best Egg unique.
We’re looking for collaborative, curious problem-solvers who are excited to make an impact and grow with us.
We’re proud to be an equal opportunity employer committed to building a diverse and inclusive team.
The Role
Best Egg is hiring a Lead Software Engineer II for AI Operations to design, ship, and operate production-grade LLM applications, agents, and automations across the business. You’ll own the end‑to‑end path from prototype to stable deployment—building RAG pipelines, instituting evals and guardrails, and driving cost/performance optimization. Our stack includes Python, Metaflow on Outerbounds, AWS (including Bedrock), OpenAI/ChatGPT, and Cursor; Databricks is being evaluated and available where it makes sense. Your work will accelerate delivery, reduce LLM unit costs, and improve output quality for use cases like agent assist, compliance automation, process automation, and QA—treating AI Ops as a force multiplier for the enterprise.
Key Responsibilities
Build and ship LLM apps & agents: Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one
Own RAG pipelines: Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics
AWS Infrastructure & Orchestration: Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement, automated provisioning, and resource-usage governance as core platform capabilities
Observability & SRE for AI: Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests
Guardrails & governance: Enforce PII redaction, safety filters, role-based access, audit logs, and human‑in‑the‑loop review paths to control quality and risk
CI/CD for AI artifacts: Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys with automatic rollback triggers
Cost & performance: Cut run‑rate spend through caching, truncation, batching, autoscaling, and model routing; establish clear unit economics per workflow
Developer enablement: Provide templates, SDKs, and high‑quality abstractions that let product teams ship safely without bespoke plumbing; improve developer experience
Platform integration: Build primarily in Python and Metaflow (Outerbounds); deploy on AWS (Bedrock + core services) and OpenAI; use Cursor in daily workflows; help evaluate and, when appropriate, run on Databricks
Production posture: Participate in on‑call, author runbooks, and remove single‑thread risk for AI services; drive reliability and resilience akin to ML Ops
What You’ll Need to Succeed
Experience: 5–10 years of professional software engineering (or equivalent) with 2+ years building AI/LLM applications; portfolio of shipped AI projects (links to code, demos, or case studies)
Exploration: Demonstrated passion for relentless exploration of the latest AI models, frameworks, and tooling, ensuring constant adoption of state-of-the-art innovations in the workflow
LLM product engineering: Hands‑on with some/all of OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi‑turn orchestration, streaming, and prompt/version management
RAG expertise: Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking), integration with vector database, and measuring retrieval quality
Full‑stack or equivalent backend depth: Comfortable building APIs/services and simple UIs where needed; strong fundamentals in Python and modern packaging/testing
DevOps & deployment: CI/CD, containers, cloud fundamentals (AWS), and runtime performance tuning; experience operating services in production
Platform & orchestration: Metaflow (Outerbounds) preferred; Databricks familiarity is a plus; ability to integrate data/feature pipelines and schedule/operate flows
Observability & testing for AI: Tracing and logging, expertise in tools like Datadog, Dynatrace or Grafana where relevant for AI monitoring is essential
Cost, quality, and risk mindset: Comfortable optimizing latency/throughput/cost, and implementing guardrails for PII/safety/compliance
Collaboration & mentorship: Partner effectively with data scientists, analysts, and engineers; promote best practices and high‑leverage abstractions
Bonus points: Fine‑tuning or distillation experience; Kubernetes or FastAPI exposure; familiarity with Snowflake or similar warehousing for retrieval sources
Compensation: $150,000 - $170,000 a year
This role sits in AI Operations and focuses on making AI safe, fast, and economical to scale—unlocking multiple use cases through one high‑leverage engineering hire.
Please include links to your portfolio (GitHub, write‑ups, or demos) with your application.
Employee Benefits
Best Egg offers many additional benefits for our employees, including (but not limited to):
Pre-tax and post-tax retirement savings plans with a competitive company matching program
Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays
Multiple health care plans to choose from, including dental and vision options
Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts
Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs
Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!
Compliance
In compliance with the CCPA, Best Egg is fully committed to handling the personal information and data of employees and job applications responsibly with respect and due care. Review our CCPA Employee Policy here.
Don't let this one get away.
About the company
Similar Remote Jobs
Opened 12 days ago Featured Job Remote Job
Freelance DevOps Support Engineer (Part-Time, Remote, Americas Time Zone)nnSoftware GmbHPart Time$33.8k - $67.6k per yearOpened 10 days ago Promoted Job Remote Job
Closes in 6 days Promoted Job Remote Job
Closes in 3 days Promoted Job Remote Job
New Job! Remote Job
