
Machine Learning Engineer [IC4]
Who we are
Our mission at Sourcegraph is to make it so that everyone can code, not just ~0.1% of the population. Our code graph powers Cody, the most powerful and accurate AI coding assistant, as well as our Code Search product, which helps devs explore their entire codebase and make large-scale migrations and security fixes. Weâre building software that builds software, and in doing so weâre making devs more productive and preparing for a world where a lot more code gets written.
Itâs an exciting time to join Sourcegraph. AI has taken over the world, and weâve spent the last 10 years building infrastructure thatâs integral to making AI generated code more powerful and accurate. Our customers include 4/5 FAANG companies, 4 of the top 10 banks, government organizations, Uber, Plaid, and many other companies building the software that pushes the world forward. Weâve raised $225M at a $2.625B valuation from Andreessen Horowitz, Sequoia, Redpoint, Craft and others. Weâre making ambitious bets on our future and weâre looking to hire exceptional people to join our team as we make Sourcegraph one of the biggest and most influential companies in the world.
Working hours
đ Given that we are an all-remote company and hire almost anywhere in the world, we donât have a particular time-zone preference for this role. However, you may need to be available for non-recurring urgent meetings outside of working hours.
Why this job is exciting
We are creating a machine learning team at Sourcegraph, aimed at creating the most powerful coding assistant in the world. Many companies are trying, but Sourcegraph is uniquely differentiated by our rich code intelligence data and powerful code search platform. In the world of prompting LLMs, context is everything, and Sourcegraphâs context is simply the best you can get: IDE-quality, global-scale, and served lightning fast. Our code intelligence, married with modern AI, is already providing a remarkable alpha experience, and you can help us unlock its full potential.
We are looking for an experienced full stack ML engineer with demonstrated industry experience in productionizing large scale ML models in industrial settings. And if you happen to have an entrepreneurial streak, youâre in luck: We have an enterprise distribution pipeline, so whatever you build can be deployed straight to enterprise customers with some of the largest code bases in the world, without all the go-to-market hassle youâd encounter in a startup.
You will be a scientist at Sourcegraph Labs doing R&D, and pushing the boundaries of what AI can do, as an IC on our new ML team. You will have the full power of Sourcegraphâs Code Intelligence Platform at your disposal, and youâll be working on a coding assistant that is already awesome even after just a few weeks of work, so this is a greenfield opportunity to multiply dev productivity to unprecedented levels.
đ Within one month, you willâŚ
Start building a trusting relationship with your peers, and learning the company structure.
Be set up to do local development, and be actively prototyping.
Dive deep into how AI and ML is already used at Sourcegraph and identify ways to improve moving forward.
Develop simulated datasets using Gym style frameworks across a number of Cody use cases.
Experiment with changes to Cody prompts, context sources and evaluate the changes with offline experimentation datasets.
Ship a substantial new feature to end users.
đ Within three months, you willâŚ
Building out feature computation, storage, monitoring, analysis and serving systems for features required across our Cody LLM stack
Be contributing actively to the worldâs best coding assistant.
Developing distributed training & experiment infrastructure over Code AI datasets, and scaling distributed backend services to reliably support high-QPS low latency use cases.
Be following all the relevant research, and conducting research of your own.
đ Within six months, you willâŚ
Be fully ramped up and owning key pieces of the assistant.
Be ramped up on other relevant parts of the Sourcegraph product.
Be helping design and build what might become the biggest dev accelerator in 20 years.
Owning a number of ML systems, and building core data and model metadata systems powering the end-to-end ML lifecycle.
Be developing a highly scalable, high-QPS inference service providing low latency performance using a mix of CPU and GPU hardware to most efficiently utilize resources.
Be driving the technical vision and owning couple of major ML components, including their modeling and ML infra roadmap.
About you
You are an experienced full stack ML engineer with demonstrated industry experience in formulating ML solutions, developing end to end data orchestration pipelines, deploying large scale ML models and experimenting offline and online to drive business impact for Cody users. You want to be part of a world-class team to push the boundaries of AI, with a particular focus on leveraging Sourcegraphâs code intelligence to leapfrog competitors.
First, your AI background could look like a few different things:
Youâve worked on AI systems and have built ML at large tech companies, specifically experience in developing and productionising machine learning models.
Hands-on experience using data processing tools like Beam, Spark or Flink in a cloud environment like GCP or AWS and first-hand knowledge about data management concepts.
You have a deep ML background and have demonstrated an ability to be customer and company focused. You are hands-on and can build machine learning
Hands-on experience training and serving large-scale (10GB+) models using frameworks such as Tensorflow or PyTorch
Experience with Docker, Kubernetes, Kubeflow or Flink, knowledge of CI/CD in the context of ML pipelines
You have some hands-on experience working with large foundational models and their toolkits. Familiarity with LLMs such as Llama, StarCoder etc., model fine-tuning techniques (LORA, QLORA), prompting techniques (Chain of Thought, ReACT, etc) and model evaluation.
Youâve worked in NLP or language models at a top-tier research lab
If youâve been anywhere near the field lately, you can probably pick up enough about LLM capabilities to be able to drive this space, as itâs all greenfield.
Second, you have some understanding of programming languages, and tools that manipulate code. This could have taken any number of forms; e.g.:
Youâve worked with grammars and parser generators, or Treesitter
Youâve worked with compilers and semantic analysis, e.g. type systems
Youâve written an interpreter, or worked on a virtual machine
Youâve done static analysis involving scanning source code for semantic information
It doesnât really matter how you know it, but itâs important that youâre familiar with the basic concepts of semantic representations of source code, and how theyâre produced and consumed by tooling.
Level
đ This job is an IC4. You can read more about our job leveling philosophy in our Handbook.
Compensation
đ¸ We pay you an above-average salary because we want to hire the best people who are fully focused on helping Sourcegraph succeed, not worried about paying bills. You will have the flexibility to work and live anywhere in the world (unless specified otherwise in the job description), and weâll never take your location or current/past salary information into account when determining your compensation. As an open and transparent company that values equitable and competitive compensation for everyone, our compensation ranges are visible to every single Sourcegraph Teammate. To determine your salary, we use a number of market and data-driven salary sources and target the high-end of the range, ensuring that weâre always paying above market regardless of where you live in the world.
đ° The target compensation for this role is $204,000 USD base.
đ In addition to our cash compensation, we offer equity (because when we succeed as a company, we want you to succeed, too) and generous perks & benefits.
Interview process [~5.5 hour total interview]
Below is the interview process you can expect for this role (you can read more about the types of interviews in our Handbook). It may look like a lot of steps, but rest assured that we move quickly and the steps are designed to help you get the information needed to determine if weâre the right fit for you⌠Interviewing is a two-way street, after all!
đ Introduction Stage - we have initial conversations to get to know you betterâŚ
[30m] Recruiter Screen with Devon Coords
[60m] Hiring Manager Screen / ML Depth with Rishabh Mehrotra
đ§âđť Team Interview Stage - we then delve into your experience in more depth and introduce you to members of the teamâŚ
[45m] Technical Deep Dive
[60m] Architecture Interview
[Async] Pairing Exercise with Beyang Liu
đ Final Interview Stage - we move you to our final round, where you will gain a better understanding of our business holisticallyâŚ
[30m] Values Interview
[30m] Leadership Interview with Quinn Slack
We check references and conduct your background check
Please note - you are welcome to request additional conversations with anyone you would like to meet, but didnât get to meet during the interview process.
Not sure if this is you?
We want a diverse, global team, with a broad range of experience and perspectives. If this job sounds great, but youâre not sure if you qualify, apply anyway! We carefully consider every application, and will either move forward with you, find another team that might be a better fit, keep in touch for future opportunities, or thank you for your time.
Learn more about us
To create a product that serves the needs of all developers, we are building a diverse all-remote team that is distributed across the world. Sourcegraph is an equal opportunity workplace; we welcome people from all backgrounds and communities.
We provide competitive compensation and practical benefits to keep you happy and healthy so that you can do your best work.
Learn more about what it is like to work at Sourcegraph by reading our handbook.
We want to ensure Sourcegraph is an environment that suits your working style and empowers you to do your best work, so we are eager to answer any questions that you have about us at any point in the interview process.
Go back to the careers page for all open positions.
Sourcegraph participates in E-Verify for U.S. Employees
This job is closed
But you can apply to other open Remote Developer / Engineer jobs
About the company
Similar Remote Jobs
- Opened 10 days ago Featured Job Remote Job
- Opened 11 days ago Featured Job Remote Job
- Closes in 8 days Featured Job Remote Job
- New Job! Remote Job