Data Architect/Scientist 630364 (NCI) at Leidos

Full-time Position in Columbia

CISB is currently seeking a Database Architect to drive some of the research initiatives in the group.

The Data Scientist will 1) develop applications integrating and mining complex biomedical data, 2) Develop and implement new technologies in data science to inform and impact cancer research at FNLCR, 3) Lead efforts of data architecture and modeling as well as building databases and integration services using variety of tools and technologies such as RDBMS, NoSQL, 4) Graph Models and distribute architecture in accordance with the functional and nonfunctional requirements of other teams and scientists, 5) Develop advanced analytic models and machine learning algorithms that discover clinical insights, 6) Develop custom data querying/mining pipelines for mining and integrating information from clinical and genomic data, multi-level biological annotations and information from other knowledge mining applications, 7) Coordinate with the other technical and nontechnical teams in building public-facing applications and services (APIs) for a variety of users in the cancer research field, and 8) Work closely with other FNL and NCI teams to coordinate activities and develop collaborative projects


  • Possession of a Master's degree from an accredited college/ university according to the Council for Higher Education Accreditation (CHEA) in related discipline or six (6) years related experience in lieu of degree
  • Foreign degrees must be evaluated for U.S. equivalency
  • A minimum of ten (10) years of progressively responsible scientific and complex system/database management experience which includes working with biomedical data produced by high-throughput instruments.
  • Deep knowledge in data structures, data modeling and architecture for high-throughput and scalability
  • Significant expertise in RDBMS like Oracle, MySQL & PostgreSQL.
  • Significant expertise in NoSQL data architecture such as Key-Value, wide-column, document and graph models
  • Strong knowledge of various querying languages and query performance tuning with demonstrated ability of writing complex queries
  • Experience in building robust data pipelines and services
  • Knowledge in scripting languages and Unix/Linux OS scripting
  • Strong knowledge of algorithms and one or more programing languages such as Python, Java, or C++
  • Strong drive and initiative to explore new territories in data and knowledge mining
  • Able to work on initiatives independently and in a highly collaborative environment
  • Able to prioritize and allocate resources effectively

Preferred Qualifications

  • Master’s degree in Computer Sciences, Statistics or Bioinformatics with expertise in data science/mining/machine learning is highly desired
  • Doctoral degree in data mining is a plus
  • Knowledge of the current bioinformatics experiments and analysis methods including microarray and NGS analysis, to understand the data provenance and any associated caveats
  • Agile web development using REST/SOAP APIs
  • Experience in developing data visualization tools is a plus
  • Knowledge in NLP applications in the biological field is a plus
  • Experience in supervising both highly skilled and junior level technical staff
  • Experience in requirements gathering, documentation and reporting
  • Previous publications on data mining/integration applications are a plus
  • Strong verbal and written communication skills
  • Has strong work ethics, organized, detail oriented and focused on execution