
Posted a day ago
Big Data/PySpark Engineering Lead - Vice President
CitiBig Data/PySpark Engineering Lead - Vice President
Requirements
12+ years software building and platform engineering experience, Expert level Python programming, Extensive experience with Big Data ecosystems (Hadoop, Spark, Hive), Proficiency in SQL and Unix shell scripting, Experience with query engines like Trino, Presto, or Starburst, Knowledge of data formats like Avro, Parquet, and Iceberg, Experience with CI/CD and DevOps practices
Skills
PySparkPythonHadoopSparkSQLKafkaTrino
About the role
Responsibilities
- Design and implement scalable, fault-tolerant batch and real-time data processing pipelines.
- Lead the strategic migration of data and logic from legacy platforms to modern Data Lakehouse environments.
- Re-engineer existing stored procedures and complex legacy ETL jobs into distributed processing frameworks using Spark (Python) and Starburst/Trino.
- Design automated frameworks for Data Parity Testing to ensure accuracy between legacy and big data outputs.
- Write clean, high-performance Python code and optimize complex SQL queries to reduce latency and costs.
- Build and maintain CI/CD pipelines for automated testing and deployment of data jobs.
- Provide technical mentorship and conduct code reviews for junior and mid-level engineers.
- Collaborate with Product Managers to ensure data availability for downstream analytics and business models.
Requirements
- 12+ years of experience in software building and platform engineering.
- Expert-level Python programming skills and extensive experience with Big Data ecosystems (Hadoop, Spark, Hive, Kafka).
- Proficiency in SQL, Unix-based operating systems, and shell scripting.
- Hands-on experience with query engines such as Trino, Presto, or Starburst.
- Deep knowledge of data formats including Avro, Parquet, Iceberg, CSV, and JSON.
- Experience with source code management tools like Bitbucket or Git.
- Strong computer science fundamentals in data structures, algorithms, and databases.
- Ability to reverse engineer legacy "spaghetti" SQL or old scripts to document business logic.
Preferred Qualifications
- Experience with data lineage tools such as Collibra or Informatica.
- Experience managing technical change management during transitions from legacy BI tools to modern engines.
- An automation-first mindset and familiarity with AI tools to expedite deliveries.
- Strong communication skills with the ability to explain technical decisions to non-technical stakeholders.
About the Company
Citi is a global leader in financial services, providing a wide range of financial products and services to retail, corporate, institutional, and government clients in more than 160 countries and jurisdictions.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeBig Data/PySpark Engineering Lead - Vice President
Citi · Pune
