The goal of the AqQua project is to:
- Build an AI foundation model of plankton image data,
- Release it as an open-source tool for the global community for the purpose of facilitating everyone’s plankton-related research,
- Leverage it to develop global plankton- and particle distribution models and estimate related process rates.
To achieve this, the AqQua project is currently collecting plankton image datasets from a variety of imaging devices deployed across diverse aquatic habitats worldwide. To assemble a most diverse and extensive dataset, we encourage scientists around the world to share their plankton- and particle image data. We are very happy that we have already received overwhelming signals of support, with more than 40 academic labs and non-academic stakeholders across the globe pledging to share data and contribute expertise.
Everyone sharing data will be included as author of a planned data paper. Furthermore, everyone sharing data will be invited to actively contribute to a respective foundation model paper, as well as global distribution- and process rate papers. We therefore reach out to you, even if your data is already in the public domain. The AqQua foundation model will, most likely, perform favorably on the kinds of data it has been trained on, thus your research might benefit from sharing your data in the long-run. The AqQua project will not analyze any provided dataset in isolation nor perform any respective local analyses.
Due to our full commitment to Open Science, all data shared with the AqQua project has to come with permission to be made publicly available on July 15, 2027 under CC BY-NC 4.0 license as part of our planned data paper. Thus we are exclusively seeking data that is either already publicly available or can be made publicly available no later than July 15, 2027.
For the purpose of training a foundation model, AqQua requires image data (including scale information), as well as at least latitude, longitude, depth, date and time of observation. Classification labels (e.g. species or particle type) and trait annotations (e.g. egg-carrying) are very welcome as these can help fine-tune and benchmark the foundation model, but are not required. We would also appreciate it if you would share sample unit definition and the sampled volume information for your samples, to enable us to develop global distribution models and to estimate process rates (e.g. as in Laget et al. 2024, Clements et al. 2022 & 2023). Image data of (mono)cultures is also welcome and in this case, metadata should indicate the original sampling location, date and time.
As production-run foundation model training will commence this fall, the deadline for sharing data with AqQua is July 31, 2025.
To participate, please carefully read and fill the online form below. Note, some filling instructions and exemplary answers are provided via mouseover questionmarks. After clicking the "Submit" button at the end of the form, you will receive an email with your filled form for your records.
Should you have any questions or suggestions, please do not hesitate to contact us at aqqua@geomar.de. We would be stoked to have you on board!