Enhancing open data for Neural Network based reflectometry analysis – A future perspective for valuable test data sets

17 Oct 2023, 15:40
20m
Principal/0-0 - Salle Amphitheatre (Batiment Principal)

Principal/0-0 - Salle Amphitheatre

Batiment Principal

90
Show room on map
Talk Open data for machine learning Community talks

Speaker

Lukas Petersdorf (CAU Kiel - DAPHNE4NFDI)

Description

Researchers from Kiel and Tübingen University, as part of the DAPHNE4NFDI initiative, are collaborating to enhance machine learning models for analyzing X-ray and neutron reflectivity datasets using the Python package "mlreflect" [1] developed at Tübingen University in the group of Frank Schreiber. The collaborative effort has achieved success during beamtime at ID10 at the European Synchrotron Radiation Facility (ESRF), employing a closed-loop system for autonomous experiments guided by machine learning data analysis [2]. The University of Kiel's expertise in X-ray reflectivity analysis of liquid samples ([3]) complements the mlreflect package through expanded and improved training data. Based on the mlreflect implementations at VISA the successful machine learning data analysis and prediction at ID10 could be continued on lipid bilayer systems. Future machine learning-driven algorithms planed at P08, DESY underscore the growing significance of machine learning in reflectivity measurements.
One of the biggest challenges for machine learning still remains the lack of sufficient experimental data, requiring reliance on simulated data for training. Moreover, enough test data are often missing to validate the simulation-based reflectivity predictions. To ensure the reproducible utilization of open data in reflectivity measurements, it is imperative to formulate metadata accurately, too.
DAPHNE4NFDI is developing a minimalist reflectivity metadata schema based on measurements at the DESY beamline P08. The Open Reflectometry Standards Organization (ORSO) is also actively engaged in developing a file format for reduced reflectivity data. Ideally, a reflectivity metadata schema will encompass all metadata for subsequent basic. Open data with a comprehensive metadata schema promise benefits for the training and validation of machine learning models.
The presentation of open reflectivity data samples, as demonstrated by Linus Pithan on Zenodo, simplifies the evaluation of mlreflect prediction algorithms [4]. In this process, SciCat emerges as a promising platform for aggregating reduced data for machine learning, offering cross-referencing with other repositories to enhance accessibility and future open reflectivity data sets.
We acknowledge financial support by the BMBF through ErUM-pro, and DAPHNE4NFDI through the NFDI.
[1] A. Greco et al., Jounal of Applied Crystallography, 55, 362 (2022)
[2] L. Pithan et al., J. Synchrotron Rad., in-print (2023)
[3], B.M. Murphy, M. Greve,B. Runge, C.T. Koops, A. Elsen, J. Stettner, O.H. Seeck, and O.M. Magnussen, J. Synchrotron Rad., 21, 45 (2014)
[4] Pithan, Linus, Greco, Alessandro, Hinderhofer, Alexander, Gerlach, Alexander, Kowarik, Stefan, Rußegger, Nadine, Dax, Ingrid, & Schreiber, Frank. (2022). Zenodo. https://doi.org/10.5281/zenodo.6497438

Which point of view is your contribution addressing? My research would benefit from more and better curated open data
What best describes your position? domain scientist

Primary author

Lukas Petersdorf (CAU Kiel - DAPHNE4NFDI)

Co-authors

Alexander Hinderhofer Bridget Murphy Frank Schreiber Linus Pithan (DESY) Svenja Hövelmann Vladimir Starostin

Presentation materials