Hybrid - Databricks Developer to implement robust data pipelines using Apache Spark on Databricks, perform data modelling using Medallion architecture, and
S.i. Systèmes
Vancouver, BC-
Nombre de poste(s) à combler : 1
- Salaire À discuter
-
Emploi Permanent
- Publié le 10 septembre 2025
-
Date d'entrée en fonction : 1 poste à combler dès que possible
Description
S.i. Systems global client with office in Vancouver is seeking a Hybrid - Intermediate Databricks Developer to implement robust data pipelines using Apache Spark on Databricks, perform data modelling using Medallion architecture, and manage Delta Lake performance. This is a hands-on development role focused on engineering scalable, maintainable, and optimized data flows in a modern cloud-based environment.
Full time permanent role based in the Vancouver office (hybrid, 1 - 4 days / week onsite, negotiable)
Salary range from $100,000 - $200,000 CAD / annum
MUST HAVE SKILLS:
- 5+ years of experience in data engineering or big data development.
- Strong hands-on experience with Databricks and Apache Spark (PySpark/SQL).
- Proven experience with Azure Data Factory, Azure Data Lake, and related Azure services.
- Experience integrating with APIs using libraries such as requests and http.
- Deep understanding of Delta Lake architecture, including performance tuning and advanced features.
- Proficiency in SQL and Python for data processing, transformation, and validation.
- Familiarity with data lakehouse architecture and both real-time and batch processing design patterns.
- Comfortable working with Git, DevOps pipelines, and Agile delivery methodologies.
NICE TO HAVE SKILLS:
- Experience with dbt, Azure Synapse, or Microsoft Fabric.
- Familiarity with Unity Catalog features in Databricks.
- Relevant certifications such as Azure Data Engineer, Databricks, or similar.
- Understanding of predictive modeling, anomaly detection, or machine learning, particularly with IoT datasets.
JOB DUTIES:
- Design, build, and maintain scalable data pipelines and workflows using Databricks (SQL, PySpark, Delta Lake).
- Develop efficient ETL/ELT pipelines for structured and semi-structured data using Azure Data Factory (ADF) and Databricks notebooks/jobs.
- Integrate and transform large-scale datasets from multiple sources into unified, analytics-ready outputs.
- Optimize Spark jobs and manage Delta Lake performance using techniques such as partitioning, Z-ordering, broadcast joins, and caching.
- Design and implement data ingestion pipelines for RESTful APIs, transforming JSON responses into Spark tables.
- Apply best practices in data modeling and data warehousing concepts.
- Perform data validation and quality checks.
- Work with various data formats, including JSON, Parquet, and Avro.
- Build and manage data orchestration pipelines, including linked services and datasets for ADLS, Databricks, and SQL Server.
- Create parameterized and dynamic ADF pipelines, and trigger Databricks notebooks from ADF.
- Collaborate closely with Data Scientists, Data Analysts, Business Analysts, and Data Architects to deliver trusted, high-quality datasets.
- Contribute to data governance, metadata documentation, and ensure adherence to data quality standards.
- Use version control tools (e.g., Git) and CI/CD pipelines to manage code deployment and workflow changes.
- Develop real-time and batch processing pipelines for streaming data sources such as MQTT, Kafka, and Event Hub.
Exigences
non déterminé
non déterminé
non déterminé
non déterminé
D'autres offres de S.i. Systèmes qui pourraient t'intéresser