
Senior Reliability Engineer
About Flinks
Flinks is where financial data moves—with purpose, trust, and impact.
We’re on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial products and experiences. Since 2016, we’ve been bridging the gap between fintechs, financial institutions, and consumers by enabling seamless, secure data connectivity.
From instant account funding to smarter lending, our solutions help power some of the most innovative financial products in North America. We partner with lenders, banks, and fintechs to streamline onboarding, prevent fraud, and fuel real-time decision-making with enriched, reliable data.
As pioneers in Canada’s open banking movement, we're not waiting for the future—we're building it. If you're bold, curious, and ready to help shape the future of finance, we’d love to meet you.
About the Reliability Team 🚒
As a Senior Reliability Engineer, you will play a pivotal role in ensuring the stability, performance, and reliability of Flinks Fintech product platforms, and monitoring & alerting systems. You will serve as an expert in both software development and system support, working closely with engineering, operations, and product teams to troubleshoot complex issues, resolve incidents, and continuously improve the technical foundation of our products. This role demands a combination of advanced coding skills, incident management experience, and an understanding of the fin-tech industry.
What You’ll Do
Develop and maintain code to quickly resolve product issues, ensuring fast recovery and long-term system stability.
Provide live operational support across multiple client applications, monitoring services and alerts to detect and resolve critical failures with minimal downtime.
Own and troubleshoot complex incidents, conduct root cause analyses, and implement long-term solutions—adhering to SLAs and internal SLOs.
Build monitoring dashboards and alerting systems to proactively detect and address issues, supporting system scalability and stability.
Analyze operational metrics and KPIs to identify trends, surface client pain points, and drive improvements.
Automate tooling and processes to improve efficiency and reduce manual work across LiveOps.
Collaborate with cross-functional teams to deliver lasting fixes for production issues and contribute to technical analyses of product gaps.
Lead and mentor reliability engineers, providing guidance and ensuring consistent delivery of high-quality work.
Participate in post-incident reviews, documenting outcomes and driving preventative action items.
Support after-hours on-call coverage as part of the LiveOps rotation
Who You Are 💪
5+ years of experience with .NET Framework (C#), ensuring production system stability
Strong coding, debugging, and troubleshooting skills, particularly in performance optimization of large-scale applications
Operationally focused with expertise in incident management and resolving live production issues
Proven experience in building and maintaining reliable monitoring and alerting systems in high-demand environments, with a focus on production support
Strong knowledge of Kubernetes, Docker, and cloud platforms (GCP preferred)
Proficiency with monitoring tools like Prometheus, Grafana, and Kibana
Experience with incident ticketing/documentation tools like FreshDesk and Confluence
Critical thinker who can identify system weaknesses and find innovative solutions
Strong project management skills with a focus on scalability and system stability
Nice to haves
ITIL Service Management certification (or equivalent) is highly desired, such as ITIL v3, ITIL v4, or other equivalent certifications.
Experience with PowerBI, web scraping, or Golang
The Interview Process 🏗
Head of People Ops
Case Assignment & Presentation
Team Lead Interview
Director Interview
Do you like this job?
About the company
Similar Remote Jobs
- Closes in 3 days Featured Job Remote Job
- Closes in 2 days Featured Job Remote Job
- New Job! Remote Job
- New Job! Remote Job