Senior Data Engineer

apartmentPsybergate (Pty) LTD placeJohannesburg calendar_month 

What you will be doing:

  • Design, build, and maintain scalable data pipelines and lakehouse structures
  • Deliver data solutions supporting analytics, BI, machine learning, and Generative AI applications
  • Apply enterprise data lake and lakehouse principles to ensure solutions are reliable, secure, governed, and fit for downstream consumption
  • Translate business and analytical requirements into production-ready data solutions
  • Build and operate solutions using Databricks, including Delta Lake, Databricks Jobs & Workflows, Unity Catalog, Databricks Bundles, notebooks, and shared libraries
  • Enable data consumption for GenAI use cases such as RAG, AI services, and agent workflows
  • Support analytics platforms, reporting tools, and downstream operational systems
  • Build data pipelines for Generative AI applications, including curated knowledge datasets, structured and semi-structured data, metadata, and lineage management
  • Enable GenAI data patterns including Retrieval Augmented Generation (RAG), prompt/context preparation, and AI model input/output flows
  • Work closely with AI Engineers and Product Owners to align engineering deliverables to AI and GenAI use cases
  • Develop production-grade pipelines using Python, PySpark, SQL, and Apache Spark
  • Implement automated testing and CI/CD practices for data engineering workloads
  • Ensure data solutions are observable, resilient, performant, and cost-efficient
  • Support operational stability, incident resolution, and root cause analysis
  • Collaborate within Agile, cross-functional product squads alongside AI/ML engineers, analytics teams, platform teams, and security stakeholders
  • Contribute to engineering reviews, standards, and design discussions
  • Maintain documentation, operational runbooks, and governance compliance

What we are looking for:

  • Relevant Degree or Diploma in Computer Science, Information Technology, Data Engineering, or related field
  • 6+ years experience as a Senior / Lead Data Engineer
  • 2+ years hands-on experience working in Databricks environments
  • Strong understanding of enterprise data lake and lakehouse architecture
  • Strong proficiency in Python, SQL, and Apache Spark
  • Experience building and operating production-grade data platforms
  • Experience working in enterprise or regulated environments
  • Strong understanding of data governance, security, and operational best practices
  • Experience working in Agile, product-aligned squads
  • Strong analytical and problem-solving skills
  • Excellent collaboration and communication skills

Advantageous:

  • Experience supporting AI, ML, or Generative AI workloads from a data engineering perspective
  • Familiarity with RAG data patterns and AI-serving datasets
  • Exposure to vector or embedding-ready data workflows
  • Cloud-native data platform experience (AWS or Azure)
  • Experience supporting analytics and AI operational workloads at scale

Please note if you do not hear from us within 3 weeks, please consider your application unsuccessful.

Follow for the Latest Vacancies

Join Psybergate Careers Channel here:

apartmentNetwork ContractingplaceJohannesburg
data platforms come together to drive impactful business solutions. We are seeking a highly skilled Senior Data Engineer to design, build, and optimise scalable cloud-native data platforms supporting analytics, AI, and enterprise reporting solutions...
business_centerHigh salary

Senior Data Engineer

apartmentChosen Online Pty LtdplaceJohannesburg
Are you a passionate Senior Data Engineer who loves building robust, scalable data solutions that power analytics, machine learning, and business transformation? We are looking for a talented and experienced Senior Data Engineer to join a dynamic...
thumb_up_altRecommended

Data Engineer

apartmentIOCOplaceJohannesburg
We are seeking a skilled Data Engineer to support the design, development, and maintenance of scalable data solutions within a banking environment. The role will focus on building reliable data pipelines, integrating data from multiple sources...