Senior Staff Data Engineer

Karius


 180k - 240k
 Full-Time
 United States  (Redwood City, CA)
 Remote   
Karius logo
About Karius
Karius is a venture-backed life science startup that is transforming the way pathogens
and other microbes are observed throughout the body. By unlocking the information
present in microbial cell-free DNA, helping doctors quickly solve their most
challenging cases, providing industry partners with access to 1000’s of biomarkers to
accelerate clinical trials, discover new microbes, and reduce patient suffering worldwide.
Karius aims to conquer infectious diseases through innovations around genomic
sequencing and machine learning. The company’s platform is already delivering
unprecedented insights into the microbial landscape, providing clinicians with a
a comprehensive test capable of identifying more than a thousand pathogens directly
from blood, and helping the industry accelerate the development of therapeutic
solutions. The Karius test we provide today is one of the most advanced solutions
available to physicians who aim to deliver better care to many otherwise ineffectively
treated patients.
 
Position Summary
Karius is building AI-driven data analytics pipelines to deliver life-saving results in the
highly complex infectious disease landscape. We are seeking a seasoned Senior Staff
Data Engineer in Redwood City, CA to lead the design and development of a scalable
data platform to meet our rapid business growth. Senior Staff Data Engineer will be
responsible for defining the technology roadmap, and developing and optimizing the
data platform to enable us to extract values from large amounts of genomic, clinical,
operation and clinical data to provide actionable insights to serve the patients and
develop innovative products. In this regard, the Senior Staff Data Engineer will work
with key stakeholders within the company to understand our data landscape and the
core needs for data governance and usage.
 
Primary Responsibilities
•   Design, develop, and operate a scalable data platform that ingests, stores, and
aggregates various datasets to meet the defined requirements;
•   As the primary subject matter expert in the data engineering domain, evaluate
technology trends in the data industry, identify those technologies relevant to
the company’s business objectives, and develop a roadmap to update the
company’s data platform;
•   Provide Machine Learning (“ML”) data platform capabilities for R&D and
Analytics teams to perform data preparation, model training and management,
and run experiments against clinical and genomic datasets;
•   Train the R&D and Analytics teams on using Karius data toolsets and mentor
and support them throughout their research and development efforts;
•   Build and maintain data ETL/ELT pipelines to source and aggregate the
required internal data to calculate operational and commercial Key Performance
Indicators (“KPIs”) and various data analysis and reporting needs;
•   Develop integrations with Karius and 3rd party systems to source, qualify and
ingest various datasets; work closely with cross-functional groups and
stakeholders, such as the product, engineering, medical, and scientific teams,
for data modeling and general life cycle management;
•   Provide data analytics and visualization tools to extract valuable insights from
the data and enable data-driven decisions; and
•   Work closely with the Security and Compliance teams, and deploy necessary
data governance to meet the regulatory and legal requirements.
 
Position Minimum Requirements
•   At least a Bachelor’s degree in Computer Science, Data Science, or Software
Engineering, Electrical Engineering, or Bio-Engineering (or its foreign equivalent);
plus
•   At least 10 years of experience as a Software or Data Engineer or similar
position, including at least 5 years in a senior or higher-level position;
 
AND (or experience must include):
 
•   4+ years of hands-on design, development and operation of data solutions using
the following data technologies: Spark and Spark Streaming, Presto, Parquet,
MLflow, Kafka, and ETL tools such as Stitch or FiveTran;
•   4+ years of hands-on experience with design, development and maintenance of
structured, semi and non-structured (NoSQL) data stores, such as MySQL,
PostgreSQL, AWS Redshift, Teradata, Graph databases like Neo4j, and
Databricks Lakehouse;
•   4+ years of hands-on development and operation of workflows and jobs using
task orchestration engines such as Airflow, Argo, NextFlow, Dollar U and Tidal;
•   4+ years of hands-on experience building and operating data solutions on
operating systems such as Linux and Unix hosted in Amazon Web Services
(AWS) cloud;
•   5+ years of hands-on building and operation of scalable infrastructure to support
batch, micro-batch, and stream data processing for large volumes of data;
•   5+ years of hands-on experience designing and implementing enterprise data
warehouse/Lakehouse solutions to house business and technical datasets and
derive KPI dimensions for consumption;
•   Demonstrated experience with enterprise data modeling in healthcare and/or life
science sectors;
•   Demonstrated experience with the development and operation of visualization
and dashboards for business KPI reporting using tools such as Tableau or
Looker;
•   Proficiency in Python and PySpark;
•   Automation of Data Testing using scripting;
•   Experience developing and managing technical and administrative controls for
data governance and regulatory compliance in the healthcare and/or life sciences
sectors;
•   Experience mentoring and coaching junior data engineers; and
•   Cross-functional project management experience.
 
Travel: No travel is required.
 
Reports to: VP, Engineering
$180,000 - $240,000 a year
Apply now