Data Engineer/ Python Developer

Position Overview:

  • Develop data pipelines to ingest, load, and transform data from multiple sources.
  • Leverage Data Platform, running on Google Cloud, to design, optimize, deploy and deliver data solutions in support of scientific discovery
  • Use programming languages like Java, Scala, Python and Open-Source RDBMS and NoSQL databases and Cloud-based data store services such as MongoDB, DynamoDB, Elasticache, and Snowflake
  • The continuous delivery of technology solutions from product roadmaps adopting Agile and DevOps principles
  • Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences
  • Design and develop data pipelines, including Extract, Transform, Load (ETL) programs to extract data from various sources and transform the data to fit the target model
  • Test and deploy data pipelines to ensure compliance with data governance and security policies
  • Moving implementation to ownership of real-time and batch processing and data governance and policies
  • Maintain and enforce the business contracts on how data should be represented and stored
  • Ensures that technical delivery is fully compliant with Security, Quality and Regulatory standards
  • Keeps relevant technical documentation up to date to support the lifecycle plan for audits/reviews.
  • Pro-actively engages in experimentation and innovation to drive relentless improvement e.g., new data engineering tools/frameworks
  • Implementing ETL processes, moving data between systems including S3, Snowflake, Kafka, and Spark
  • Work closely with our Data Scientists, SREs, and Product Managers to ensure the software is high quality and meets user requirements

01

Required Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
  • 5+ years of experience as a data engineer building ETL/ELT data pipelines.
  • Experience with data engineering best practices for the full software development life cycle, including coding standards, code reviews, source control management (GIT, continuous integrations, testing, and operations)
  • Experience in programming languages Python and SQL good to have Java, C#, C++, Go, Ruby, and Rust
  • Experience with Agile, DevOps & Automation [of testing, build, deployment, CI/CD, etc.], Airflow
  • Experience with Docker, Kubernetes, Shell Scripting
  • 2+ years of experience with a public cloud (AWS, Microsoft Azure, Google Cloud)
  • 3+ years experience with distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, Gurobi, or MySQL)
  • 2+ years experience working on real-time data and streaming applications
  • 2+ years of experience with NoSQL implementations (DynamoDB, MongoDB, Redis, Elasticache)
  • 2+ years of data warehousing experience (Redshift, Snowflake, Databricks, etc.)
  • 2+ years of experience with UNIX/Linux including basic commands and shell scripting
  • Experienced with visualization tools like SSRS, Excel, PowerBI, Tableau, Google Looker, Azure Synapse

02