Ce recruteur est en ligne!

Voilà ta chance d'être vu en premier!

Postuler maintenant

Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative

Toronto, ON
  • Nombre de poste(s) à combler : 1

  • À discuter
  • Emploi Contrat

  • Date d'entrée en fonction : 1 poste à combler dès que possible

Our valued public sector client is seeking Senior PySpark Developer to support the design, development, and maintenance of modernized data pipelines for a large-scale data modernization initiative!

Initial 5-month contract (until March 31, 2026) with a strong possibility of extension. Remote work arrangement within Canada, full-time, Monday to Friday.

The successful candidates will be responsible for developing, testing, and supporting data ingestion and transformation pipelines using PySpark, Python, and AWS-based technologies, following Agile development practices and CI/CD principles. The developers will work closely with technical and business teams to deliver scalable, high-performance data solutions that support enterprise analytics and reporting.

Responsibilities

  • Design, develop, and maintain large-scale data processing pipelines using PySpark, Spark SQL, and Python.
  • Collaborate with business and technical stakeholders to translate business requirements into technical solutions.
  • Develop modular, reusable, and maintainable code following software development best practices.
  • Implement automated testing frameworks to ensure data quality and reliability.
  • Participate in peer code reviews and apply CI/CD practices using Git-based workflows.
  • Work with Airflow or equivalent orchestration tools for pipeline scheduling and automation.
  • Develop and maintain ETL mappings, documentation, and data flow diagrams.
  • Deploy and monitor data workflows in a cloud-based environment (AWS EMR, Redshift, S3, Lambda).
  • Troubleshoot performance issues and optimize Spark jobs for scalability and efficiency.
  • Ensure compliance with quality assurance and change management procedures.

Must-Have

  • 5+ years of hands-on programming experience in Python and SQL, writing modular, maintainable code.
  • 3+ years of strong experience developing PySpark data pipelines for large-scale data processing.
  • Solid understanding of Spark DataFrames, Spark SQL, and distributed data processing concepts.
  • Practical experience working in AWS Cloud environments (e.g., EMR, Redshift, Lambda).
  • Strong knowledge of MySQL or equivalent relational databases.
  • Proficiency with Git, unit testing, and release automation.
  • Familiarity with Apache Iceberg or similar open table formats.
  • Experience with Airflow or equivalent orchestration frameworks.
  • Excellent problem-solving and troubleshooting skills, with a proactive, collaborative attitude.
  • Strong oral and written communication skills in English.

Nice to Have

  • Experience with AWS Cloud Development Kit (CDK) for Python.
  • Familiarity with serverless architecture (AWS Lambda, event-driven design).
  • Exposure to DevOps automation and continuous integration pipelines.
  • Previous experience working on healthcare or public sector data platforms.



Apply

Exigences

Niveau d'études

non déterminé

Années d'expérience

non déterminé

Langues écrites

non déterminé

Langues parlées

non déterminé