Leveraging open data from PaN facilities for machine learning

Name: Leveraging open data from PaN facilities for machine learning
Start: 2023-10-17T12:00:00+02:00
End: 2023-10-18T14:00:00+02:00
Location: No location set

17–18 Oct 2023

Europe/Paris timezone

Contact

Visual Diagnostics for Macromolecular X-Ray Diffraction: AUSPEX

17 Oct 2023, 16:00

20m

Principal/0-0 - Salle Amphitheatre (Batiment Principal)

Principal/0-0 - Salle Amphitheatre

Batiment Principal

Show room on map

Talk Open data for machine learning Community talks

Andrea Thorn (Universität Hamburg)

Structures of biological macromolecules are the key to understanding the processes of life and form the basis for developing new drugs, e.g. against COVID-19. Traditionally, the initial quality of the X-ray data set is evaluated by looking at detector images as they are recorded. An expert user used to be able to recognize problems and after collection, the data would be integrated, scaled and merged with software that required considerable manual intervention and expertise. Data quality indicators were mostly designed so that they could be calculated rapidly with the limited computing power available, and were developed to provide information about the overall data consistency, completeness and resolution, often in the form of mean derivatives and R-values.
Today, data collection is many orders of magnitude faster, in particular due to the brightness of the X-rays obtained from modern sources. The high X-ray flux, coupled with fast-readout pixel detectors means that manual inspection of the raw data is no longer practical. Unfortunately, there is a severe mismatch between the robustness of our current diagnostics and our reliance on automatic processing as many of the quality indicators in use by the automatic algorithms are not reliable enough for correct decision-making. The lack of visual inspection of detector images by expert users has created a gap in the quality control of experiments.
New algorithms which play to the strengths of modern computing power and robust statistical analyses need to be developed and implemented. In addition, much may be gained from taking the whole statistical distribution of the data into account, or even visualising the entirety of the data set instead of mean values. To address this need, we have started a software package for exploratory analyses of crystallographic data. AUSPEX [1] provides a visual and intuitive way of revealing problems in diffraction data that either require a specific processing approach. AUSPEX is available as part of CCP4 and as web service at auspex.de. The software was developed using open data; however, the lack of unprocessed raw images from beam lines is often a roadblock for new method development. Since 2021, we are utilizing convolutional neural networks [2] when statistical indicators fail us and we have made first steps towards "explainable AI" for our developments...

[1] Thorn et al. (2017). ActaCrystD73,729.
[2] Nolte et al. (2022). ActaCrystD78,187.

What topics do you think we should discuss in the working sessions?

How to get detector images open-access in protein crystallography, how we will use explainable AI to make AI more amenable to science.

Which point of view is your contribution addressing?	My research would benefit from more and better curated open data
What best describes your position?	domain scientist

Andrea Thorn (Universität Hamburg)

Soleil_Data_Visual_Diagnostics_for_Macromolecular_X_Ray_Diffraction.pdf

Leveraging open data from PaN facilities for machine learning

Contact

Visual Diagnostics for Macromolecular X-Ray Diffraction: AUSPEX

Principal/0-0 - Salle Amphitheatre

Batiment Principal

Speaker

Description

What topics do you think we should discuss in the working sessions?

Author

Presentation materials

Choose timezone

Leveraging open data from PaN facilities for machine learning

Contact

Speaker

Description

What topics do you think we should discuss in the working sessions?

Author

Presentation materials