Caution: JavaScript execution is disabled in your browser or for this website. You may not be able to answer all questions in this survey. Please, verify your browser parameters.

AqQua Data Sharing Form

PURPOSE OF THE DATA SHARING

The goal of the AqQua project is to:

  • Build an AI foundation model of plankton image data,
  • Release it as an open-source tool for the global community for the purpose of facilitating everyone’s plankton-related research,
  • Leverage it to develop global plankton- and particle distribution models and estimate related process rates.

To achieve this, the AqQua project is currently collecting plankton image datasets from a variety of imaging devices deployed across diverse aquatic habitats worldwide. To assemble a most diverse and extensive dataset, we encourage scientists around the world to share their plankton- and particle image data. We are very happy that we have already received overwhelming signals of support, with more than 40 academic labs and non-academic stakeholders across the globe pledging to share data and contribute expertise.

Everyone sharing data will be included as author of a planned data paper. Furthermore, everyone sharing data will be invited to actively contribute to a respective foundation model paper, as well as global distribution- and process rate papers. We therefore reach out to you, even if your data is already in the public domain. The AqQua foundation model will, most likely, perform favorably on the kinds of data it has been trained on, thus your research might benefit from sharing your data in the long-run. The AqQua project will not analyze any provided dataset in isolation nor perform any respective local analyses.

Due to our full commitment to Open Science, all data shared with the AqQua project has to come with permission to be made publicly available on July 15, 2027 under CC BY-NC 4.0 license as part of our planned data paper. Thus we are exclusively seeking data that is either already publicly available or can be made publicly available no later than July 15, 2027. 

For the purpose of training a foundation model, AqQua requires image data (including scale information), as well as at least latitude, longitude, depth, date and time of observation. Classification labels (e.g. species or particle type) and trait annotations (e.g. egg-carrying) are very welcome as these can help fine-tune and benchmark the foundation model, but are not required. We would also appreciate it if you would share sample unit definition and the sampled volume information for your samples, to enable us to develop global distribution models and to estimate process rates (e.g. as in Laget et al. 2024, Clements et al. 2022 & 2023). Image data of (mono)cultures is also welcome and in this case, metadata should indicate the original sampling location, date and time.

As production-run foundation model training will commence this fall, the deadline for sharing data with AqQua is July 31, 2025.

To participate, please carefully read and fill the online form below. Note, some filling instructions and exemplary answers are provided via mouseover questionmarks. After clicking the "Submit" button at the end of the form, you will receive an email with your filled form for your records.

Should you have any questions or suggestions, please do not hesitate to contact us at aqqua@geomar.de. We would be stoked to have you on board! 

DATA OWNER CONTACT INFORMATION
(This question is mandatory)
Name ("Given name Surname")
(This question is mandatory)
Email
(This question is mandatory)
Institute / Company
We will create a data provider mailing list for general information about the data collection process and updates on the project. Do you want to be added to this list?
DATA DESCRIPTION
(This question is mandatory)

Do you want to share multiple datasets?

The data owner agrees to share the following dataset:

To share data on EcoTaxa, the data owner agrees to add the “AqQua” user to the project(s) they would like to share (role: "viewer") and authorizes the AqQua project to download the data from EcoTaxa. 

Here’s how you can do that:

  • In the menu of the annotation screen, select “Project / Edit project settings.”
  • Go to the “Privileges” tab and click “New privilege” (bottom right).
  • Enter “AqQua” in the name field and select the “View” role.
    Click “Save.”

Note, this form allows you to share multiple datasets, albeit under a single selection of data sharing permissions (see DATA SHARING PERMISSIONS section below).

If you would like to share parts of your data under different conditions, you can do so by filling the form multiple times, once per choice of data sharing permissions. 

(This question is mandatory)

The data owner agrees to share the following datasets:

Please download the template spreadsheet provided below, insert your information row by row and upload it below.

Please carefully read the filling instructions in rows 1-11 of the template spreadsheet.

Note that the second tab of the template spreadsheet provides an exemplary filled sheet.

Note, for data located on EcoTaxa, you can use the Python scripts in this repository to generate a table including all the columns required to fill our template. 

Template Spreadsheet Download

  Title Comment File name
DATA SHARING PERMISSIONS

The data owner agrees to share the following types of data with the AqQua project: 

✅ Plankton / particle imaging data: Each image should capture (approximately) one object of interest; I.e., we are not seeking full frame images.

✅ Metadata: At a minimum, the following metadata has to be provided (e.g., a table that assigns the metadata to each image):

  •     Imaging device and image resolution
  •     Date and time of acquisition
  •     Location (latitude, longitude, and depth).

The data owner authorizes the AqQua project to use the data identified above for (mandatory; if any of these permissions is not viable for your data, we cannot take it in)

Deep Learning-based Model Development: The image data and all accompanying metadata can be used to train and analyze a foundational plankton image model, including fine-tuning for species classification, trait extraction, and related tasks.

✅ Data release by July 2027: The data owner authorizes the AqQua project to make the data publicly available under CC BY-NC 4.0 according to the FAIR (Findable, Accessible, Interoperable, and Reusable) principles from July 15, 2027 onwards as part of the datasets the AqQua project gathers. 

✅ Model release from November 2025 on: The data owner authorizes the AqQua project to publicly share trained AI models after November 1, 2025.

The data owner further authorizes the AqQua project to use the data described above for (please select all that apply; If you would like to choose different options for different parts of your imaging data, please fill the form multiple times, accordingly):

Each data owner will be:

Co-author of the resulting AqQua dataset publication as part of the data provider consortium.

Invited by the AqQua project to actively contribute as co-author to a resulting foundation model publication.

Invited by the AqQua project to actively contribute as co-author to publications on derived global distribution patterns and process rates.

(This question is mandatory)
Until July 15, 2027 (planned date of publication of the data paper), the data owner is providing the data under the following legal terms:
(If you would like to choose different options for different parts of your data, please fill the form multiple times, accordingly. If you choose "Other license", please name the license, which has to permit usage at least for the purposes specified above. Note, as outlined above, all data shared with AqQua will have to be made publicly available under CC BY-NC 4.0 from July 15, 2027 onwards; This question concerns the time before, i.e., the period from now until July 15, 2027)

For the planned data paper, to ensure proper acknowledgments and attribution regarding your data, we will need respective information from you (e.g., naming people and funding to acknowledge, papers to cite, and/or copyright information as can be generated here). 

You can optionally get this done now by providing acknowledgements and attribution information below. You can of course still change it later if needed; just contact us. 

If you would rather provide acknowledgments and attribution information later, just leave this field empty and we’ll follow up with you at a later stage.

CONFIDENTIALITY

Until July 15, 2027 (planned date of publication of the data paper), data will be stored on internal Helmholtz servers. Access to the data will be restricted to AqQua members and secured through login credentials associated with the AqQua Project. The AqQua Project will take reasonable measures to protect the confidentiality of the contributed data until the agreed-upon release date. The AqQua project will protect the provided personal data (contact information) as laid out in the AqQua privacy policy.