Data Engineer - Databricks Specialist

Posted 2026-05-06
Remote, USA Full-time Immediate Start

We are seeking an experienced Data Engineer with deep expertise in Databricks to design, build, and maintain scalable data pipelines and analytics solutions. This role requires 5 years of hands-on experience in data engineering with a strong focus on the Databricks platform.

Key Responsibilities:

  • Data Pipeline Development & Management
  • Design and implement robust, scalable ETL/ELT pipelines using Databricks and Apache Spark
  • Process large volumes of structured and unstructured data
  • Develop and maintain data workflows using Databricks workflows, Apache Airflow, or similar orchestration tools
  • Optimize data processing jobs for performance, cost efficiency, and reliability
  • Implement incremental data processing patterns and change data capture (CDC) mechanisms

Databricks Platform Engineering:

  • Build and maintain Delta Lake tables and implement medallion architecture (bronze, silver, gold layers)
  • Develop streaming data pipelines using Structured Streaming and Delta Live Tables
  • Manage and optimize Databricks clusters for various workloads
  • Implement Unity Catalog for data governance, security, and metadata management
  • Configure and maintain Databricks workspace environments across development, staging, and production

Data Architecture & Modeling:

  • Design and implement data models optimized for analytical workloads
  • Create and maintain data warehouses and data lakes on cloud platforms (Azure, AWS, or GCP)
  • Implement data partitioning, indexing, and caching strategies for optimal query performance
  • Collaborate with data architects to establish best practices for data storage and retrieval patterns

Performance Optimization & Monitoring:

  • Monitor and troubleshoot data pipeline performance issues
  • Optimize Spark jobs through proper partitioning, caching, and broadcast strategies
  • Implement data quality checks and automated testing frameworks
  • Manage cost optimization through efficient resource utilization and cluster management
  • Establish monitoring and alerting systems for data pipeline health and performance

Collaboration & Best Practices:

  • Work closely with data scientists, analysts, and business stakeholders to understand data requirements
  • Implement version control using Git and follow CI/CD best practices for code deployment
  • Document data pipelines, data flows, and technical specifications
  • Mentor junior engineers on Databricks and data engineering best practices
  • Participate in code reviews and contribute to establishing team standards

Required Qualifications

Experience & Skills:

  • 5+ years of experience in data engineering with hands-on Databricks experience
  • Strong proficiency in Python and/or Scala for Spark application development
  • Expert-level knowledge of Apache Spark, including Spark SQL, DataFrames, and RDDs
  • Deep understanding of Delta Lake and Lakehouse architecture concepts
  • Experience with SQL and database optimization techniques
  • Solid understanding of distributed computing concepts and data processing frameworks
  • Proficiency with cloud platforms (Azure, AWS, or GCP) and their data services
  • Experience with data orchestration tools (Databricks Workflows, Apache Airflow, Azure Data Factory)
  • Knowledge of data modeling concepts for both OLTP and OLAP systems
  • Familiarity with data governance principles and tools like Unity Catalog
  • Understanding of streaming data processing and real-time analytics
  • Experience with version control systems (Git) and CI/CD pipelines

Preferred Qualifications

  • Databricks Certified Data Engineer certification (Associate or Professional)
  • Experience with machine learning pipelines and MLOps on Databricks
  • Knowledge of data visualization tools (Power BI, Tableau, Looker)
  • Experience with infrastructure as code (Terraform, CloudFormation)
  • Familiarity with containerization technologies (Docker, Kubernetes)

Similar Jobs

Back to Job Board