Big Data/PySpark Engineering Lead - Vice President at Citi - ScoutJobs - The AI-curated global job board
Skip to content
Citi
Posted a day ago

Big Data/PySpark Engineering Lead - Vice President

CitiBig Data/PySpark Engineering Lead - Vice President

Requirements

12+ years software building and platform engineering experience, Expert level Python programming, Extensive experience with Big Data ecosystems (Hadoop, Spark, Hive), Proficiency in SQL and Unix shell scripting, Experience with query engines like Trino, Presto, or Starburst, Knowledge of data formats like Avro, Parquet, and Iceberg, Experience with CI/CD and DevOps practices

Skills

PySparkPythonHadoopSparkSQLKafkaTrino

About the role

Responsibilities

  • Design and implement scalable, fault-tolerant batch and real-time data processing pipelines.
  • Lead the strategic migration of data and logic from legacy platforms to modern Data Lakehouse environments.
  • Re-engineer existing stored procedures and complex legacy ETL jobs into distributed processing frameworks using Spark (Python) and Starburst/Trino.
  • Design automated frameworks for Data Parity Testing to ensure accuracy between legacy and big data outputs.
  • Write clean, high-performance Python code and optimize complex SQL queries to reduce latency and costs.
  • Build and maintain CI/CD pipelines for automated testing and deployment of data jobs.
  • Provide technical mentorship and conduct code reviews for junior and mid-level engineers.
  • Collaborate with Product Managers to ensure data availability for downstream analytics and business models.

Requirements

  • 12+ years of experience in software building and platform engineering.
  • Expert-level Python programming skills and extensive experience with Big Data ecosystems (Hadoop, Spark, Hive, Kafka).
  • Proficiency in SQL, Unix-based operating systems, and shell scripting.
  • Hands-on experience with query engines such as Trino, Presto, or Starburst.
  • Deep knowledge of data formats including Avro, Parquet, Iceberg, CSV, and JSON.
  • Experience with source code management tools like Bitbucket or Git.
  • Strong computer science fundamentals in data structures, algorithms, and databases.
  • Ability to reverse engineer legacy "spaghetti" SQL or old scripts to document business logic.

Preferred Qualifications

  • Experience with data lineage tools such as Collibra or Informatica.
  • Experience managing technical change management during transitions from legacy BI tools to modern engines.
  • An automation-first mindset and familiarity with AI tools to expedite deliveries.
  • Strong communication skills with the ability to explain technical decisions to non-technical stakeholders.

About the Company

Citi is a global leader in financial services, providing a wide range of financial products and services to retail, corporate, institutional, and government clients in more than 160 countries and jurisdictions.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Big Data/PySpark Engineering Lead - Vice President

Citi · Pune

Sign up to apply