Data Engineer

Shiru


 120k - 160k
 Full-Time
 United States  (Alameda, CA)
 Hybrid   
Shiru logo
At Shiru we believe that food should be delicious and nourishing without negatively affecting our planet. Acknowledging our growing global population as well as the imminent effects of climate change, Shiru’s mission is to create better protein ingredients that will catapult us into a sustainable food future. 
 
With our mission in mind, Shiru makes high quality, functional food proteins through better leveraging our precious environmental resources. To do this, we employ technologies originally  created to solve problems in adjacent industries, including computational biology, machine learning, and industrial fermentation and bioprocessing.
 
We apply computational intelligence to find the most functional natural food proteins in the world, harnessing the inherent ability of microflora to produce them. We then partner with food and beverage companies to incorporate these unique protein ingredients into everyday products. Shiru is now expanding our team of dedicated professionals across multiple disciplines to make enhanced protein ingredients for a better world.
 
About the role
Shiru seeks an experienced programmer and engineer to develop the architecture and pipelines for ingesting scientific data. At Shiru, our business, R&D, and data science teams rely on robust access to lab generated, multi-omics, and protein structure/function data to drive our core processes. You will create the data warehouse and ETL pipelines to ingest, store, and serve the data to all stakeholders. This role is highly cross functional and will require strong collaboration with wet lab scientists, bioinformaticists, engineers, researchers and Machine Learning experts. As a key member of our engineering team, you will get involved in diverse activities and have a strong opportunity for growth.
 
About you
You are a detail oriented engineer with deep expertise in data warehouses and also have broad experience in programming, DevOps, and cloud computing. You have great communication skills and are able to gather requirements from different functional teams. Ideally you have worked in the biotech industry and have experience with LIMS or processing data from laboratory equipment. You constantly strive for quality and rigorous engineering processes.
 
 

Responsibilities

    • Build and maintain ETL pipelines to ingest data from a wide variety of public and proprietary sources.
    • Create data pipelines to capture, process, and store experimental design and data from the lab.
    • Manage DevOps and cloud infrastructure for the engineering team.
    • Design schemas that allow for efficient storage and retrieval of data.
    • Create tools that enable the company to turn data into actionable knowledge. 
    • Collaborate with laboratory and data scientists to enable analytics and reporting of scientific data.
    • Collaborate with software and machine learning engineers to enable quick and easy consumption of data.

Attributes

    • You write clean, modular, and maintainable code.
    • You are a continual learner and drive innovation by understanding new frameworks and technologies.
    • You are a self-starter, comfortable taking initiative without direct supervision.
    • Excellent communication and stakeholder management skills with the ability to relay technical information to non-technical audiences.
    • You expect your work to be meaningful and strive to be part of a business dedicated to having a positive impact on the planet.

Requirements

    • BS/MS in Computer Science or equivalent experience/training
    • 3+ years of experience building production data pipelines
    • Extensive expertise in Python
    • Experience working with distributed datasets (Spark, Dask)
    • Expertise in containerization: Docker and scalable Kubernetes Clusters
    • Expertise in SQL 
    • Experience in AWS ecosystem with particular focus on Batch, ECS, and EKS
    • Experience with workflow managers (e.g. Prefect, Airflow, Luigi, Snakemake,  Mage, etc…)
    • Experience with testing and CI/CD frameworks
    • Proficiency with Unix, Git, and other command-line tools

Bonus skills

    • Familiarity with genomics and/or proteomics
    • Experience processing large protein databases (e.g. Uniprot)
    • Experience with Terraform, Prefect, Spot.io, or Snowflake
$120,000 - $160,000 a year
Actual compensation within the above ranges will be dependent upon the individual's skills, experience, qualifications, and applicable laws. The total rewards package will also include a generous equity component plus other benefits.
At Shiru, we're looking for people with passion, grit, and integrity. You're encouraged to apply even if your experience doesn't precisely match the job description. We’re expecting your skills and passion to stand out—and set you apart—especially if your career has taken some extraordinary twists and turns. Please join us in this singular opportunity to create the future of food!  
 
Shiru is an equal opportunity employer who values diversity. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Shiru offers competitive compensation and employee benefits along with an attractive equity package commensurate with candidate qualifications.
Apply now