Speaker
Description
Serial Synchrotron Crystallography (SSX) experiments conducted at microfocus beamlines involve the collection of diffraction data from multiple microcrystals contained within one or more experimental supports until a complete dataset is obtained (Diederichs & Wang, 2017). An experimental sample for SSX typically consists of a set of 10 to 10,000 crystals with sizes around 5x5x5 µm³. A widely adopted strategy in the field to locate the positions of these crystals is to scan an area of interest with the X-ray beam to measure the amount of diffraction at each point within that region, thereby discovering the crystal positions within it (Coquelle et al. 2015). The drawback of this approach is that it results in what is referred to in the field as radiation damage, which is particularly severe for microcrystals (De la Mora et al. 2020).
Microcrystal segmentation for SSX, prior to the final data collection, is presented as a non-invasive and rapid alternative to crystal localization via X-ray beam scanning. Currently, there are some deep learning-based models for crystal segmentation, but these models are often focused on very specific scenarios (Kardoost et al. 2023, Tran et al. 2020) or on the segmentation of a small number of crystals with dimensions much larger than the typical scenario of an SSX experiment (Bischoff et al. 2022). The main challenge in crystal segmentation for SSX is the annotation, which, given the wide range of scenarios in these experiments, is particularly complex. This challenge is exacerbated by the lack of open, organized, and annotated image data on crystals, except for the MARCO dataset (https://marco.ccr.buffalo.edu), which is primarily focused on image classification. Bischoff et al. have addressed the annotation problem for suspended microcrystals synthetically by generating a simulated training dataset with fully defined bounding boxes.
Our goal is to extend the work of Bischoff et al. in order to generate crystal images that allow for the creation of more generalized models for SSX, including various lighting patterns and different types of crystals. These generalized models can then be fine-tuned with a small number of real images for more specific tasks. Additionally, we anticipate that this model will assist in annotating real datasets, which can later be refined with human intervention, thereby contributing to the creation of open, organized, and annotated crystal datasets.
Please note that this talk will be recorded.
What topics do you think we should discuss in the working sessions?
How to put in common (and annotate) all those auxiliary data hanging around (e.g. sample images).
Which point of view is your contribution addressing? | My research would benefit from more and better curated open data |
---|---|
What best describes your position? | other |
If "other", please specify: | Scientific data management |