- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 5+ years of experience as a data engineer building ETL/ELT data pipelines.
- Experience with data engineering best practices for the full software development life cycle, including coding standards, code reviews, source control management (GIT, continuous integrations, testing, and operations)
- Experience in programming languages Python and SQL good to have Java, C#, C++, Go, Ruby, and Rust
- Experience with Agile, DevOps & Automation [of testing, build, deployment, CI/CD, etc.], Airflow
- Experience with Docker, Kubernetes, Shell Scripting
- 2+ years of experience with a public cloud (AWS, Microsoft Azure, Google Cloud)
- 3+ years experience with distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, Gurobi, or MySQL)
- 2+ years experience working on real-time data and streaming applications
- 2+ years of experience with NoSQL implementations (DynamoDB, MongoDB, Redis, Elasticache)
- 2+ years of data warehousing experience (Redshift, Snowflake, Databricks, etc.)
- 2+ years of experience with UNIX/Linux including basic commands and shell scripting
- Experienced with visualization tools like SSRS, Excel, PowerBI, Tableau, Google Looker, Azure Synapse
Data Engineer/ Python Developer
Position Overview:
- Develop data pipelines to ingest, load, and transform data from multiple sources.
- Leverage Data Platform, running on Google Cloud, to design, optimize, deploy and deliver data solutions in support of scientific discovery
- Use programming languages like Java, Scala, Python and Open-Source RDBMS and NoSQL databases and Cloud-based data store services such as MongoDB, DynamoDB, Elasticache, and Snowflake
- The continuous delivery of technology solutions from product roadmaps adopting Agile and DevOps principles
- Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences
- Design and develop data pipelines, including Extract, Transform, Load (ETL) programs to extract data from various sources and transform the data to fit the target model
- Test and deploy data pipelines to ensure compliance with data governance and security policies
- Moving implementation to ownership of real-time and batch processing and data governance and policies
- Maintain and enforce the business contracts on how data should be represented and stored
- Ensures that technical delivery is fully compliant with Security, Quality and Regulatory standards
- Keeps relevant technical documentation up to date to support the lifecycle plan for audits/reviews.
- Pro-actively engages in experimentation and innovation to drive relentless improvement e.g., new data engineering tools/frameworks
- Implementing ETL processes, moving data between systems including S3, Snowflake, Kafka, and Spark
- Work closely with our Data Scientists, SREs, and Product Managers to ensure the software is high quality and meets user requirements