We are seeking an experienced Data Engineer with deep expertise in Databricks to design, build, and maintain scalable data pipelines and analytics solutions. This role requires 5 years of hands-on experience in data engineering with a strong focus on the Databricks platform.

Key Responsibilities:

Data Pipeline Development & Management

Design and implement robust, scalable ETL/ELT pipelines using Databricks and Apache Spark

Process large volumes of structured and unstructured data

Develop and maintain data workflows using Databricks workflows, Apache Airflow, or similar orchestration tools

Optimize data processing jobs for performance, cost efficiency, and reliability

Implement incremental data processing patterns and change data capture (CDC) mechanisms

Databricks Platform Engineering:

Build and maintain Delta Lake tables and implement medallion architecture (bronze, silver, gold layers)

Develop streaming data pipelines using Structured Streaming and Delta Live Tables

Manage and optimize Databricks clusters for various workloads

Implement Unity Catalog for data governance, security, and metadata management

Configure and maintain Databricks workspace environments across development, staging, and production

Data Architecture & Modeling:

Design and implement data models optimized for analytical workloads

Create and maintain data warehouses and data lakes on cloud platforms (Azure, AWS, or GCP)

Implement data partitioning, indexing, and caching strategies for optimal query performance

Collaborate with data architects to establish best practices for data storage and retrieval patterns

Performance Optimization & Monitoring:

Monitor and troubleshoot data pipeline performance issues

Optimize Spark jobs through proper partitioning, caching, and broadcast strategies

Implement data quality checks and automated testing frameworks

Manage cost optimization through efficient resource utilization and cluster management

Establish monitoring and alerting systems for data pipeline health and performance

Collaboration & Best Practices:

Work closely with data scientists, analysts, and business stakeholders to understand data requirements

Implement version control using Git and follow CI/CD best practices for code deployment

Document data pipelines, data flows, and technical specifications

Mentor junior engineers on Databricks and data engineering best practices

Participate in code reviews and contribute to establishing team standards

Required Qualifications

Experience & Skills:

5+ years of experience in data engineering with hands-on Databricks experience

Strong proficiency in Python and/or Scala for Spark application development

Expert-level knowledge of Apache Spark, including Spark SQL, DataFrames, and RDDs

Deep understanding of Delta Lake and Lakehouse architecture concepts

Experience with SQL and database optimization techniques

Solid understanding of distributed computing concepts and data processing frameworks

Proficiency with cloud platforms (Azure, AWS, or GCP) and their data services

Experience with data orchestration tools (Databricks Workflows, Apache Airflow, Azure Data Factory)

Knowledge of data modeling concepts for both OLTP and OLAP systems

Familiarity with data governance principles and tools like Unity Catalog

Understanding of streaming data processing and real-time analytics

Experience with version control systems (Git) and CI/CD pipelines

Preferred Qualifications

Databricks Certified Data Engineer certification (Associate or Professional)

Experience with machine learning pipelines and MLOps on Databricks

Knowledge of data visualization tools (Power BI, Tableau, Looker)

Experience with infrastructure as code (Terraform, CloudFormation)

Familiarity with containerization technologies (Docker, Kubernetes)

Data Engineer - Databricks Specialist

Key Responsibilities:

Similar Jobs

Recent Jobs

You May Also Like