This recruiter is online.

This is your chance to shine!

Apply Now

Senior Site Reliability Engineer to design and implement Dynatrace and rollout adoption of Observability practices, tools and frameworks. -0091997

Vancouver, BC
  • Number of positions available : 1

  • To be discussed
  • Contract job

  • Starting date : 1 position to fill as soon as possible

Our Vancouver Client is seeking a Senior Site Reliability Engineer to design and implement Dynatrace and rollout adoption of Observability practices, tools and frameworks. -0091997


12 months contract, Vancouver - 1-2 days/month in office or as needed basis for meetings.


Must Have:

  • Extensive and recent experience as a Site Reliability Engineer (SRE)/Azure/DevOps engineer with a focus on Dynatrace and Observability practices within Cloud (Azure, AWS)
  • Strong proficiency in Dynatrace monitoring solutions, including configuration, customization, and optimization.
  • Hands-on experience with Observability tools and practices such as distributed tracing, logging, metrics collection, and anomaly detection.
  • Experience with automation tools (Ansible, Terraform) and Infrastructure as Code (IaC) principles and containerization technologies (Docker, Kubernetes).
  • Solid understanding of cloud platforms (AWS, Azure, GCP)
  • Excellent problem-solving skills, analytical thinking, and the ability to troubleshoot complex technical issues.
  • Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams and drive initiatives to completion.
  • Bachelor's degree in Computer Science, Engineering, or related field;


Nice to Have:

  • Relevant certifications (Dynatrace, AWS, Azure, Kubernetes, etc.)
  • Master's Degree



Responsibilities:

  • Serve as the subject matter expert (SME) for Dynatrace, responsible for configuring, optimizing, and managing Dynatrace monitoring solutions.
  • Design and implement monitoring strategies using Dynatrace to ensure comprehensive visibility into system performance, availability, and reliability
  • Collaborate with our Engineering & Platform teams to ensure our services, platforms and infrastructure are emitting the right metrics
  • Lead the rollout and adoption of Observability practices, tools, and frameworks across teams and projects.
  • Collaborate with Incident Management teams to resolve critical incidents, conduct post-incident reviews, and implement preventive measures.
  • Communicate complex information clearly and concisely, to explain various business and technical information
  • Proactively identify and mitigate potential issues, bottlenecks, and performance degradation to ensure system reliability and uptime
  • Drive automation initiatives using tools like Ansible, Terraform, or Kubernetes to streamline deployment, configuration, and management of infrastructure.
  • Conduct capacity planning assessments, analyze resource utilization trends, and forecast capacity requirements to support business growth and scalability.
Apply

Requirements

Level of education

undetermined

Work experience (years)

undetermined

Written languages

undetermined

Spoken languages

undetermined