Caution: JavaScript execution is disabled in your browser or for this website. You may not be able to answer all questions in this survey. Please, verify your browser parameters.

AqQua Data Sharing Form

There are 16 questions in this survey.
PURPOSE OF THE DATA SHARING

The goal of the AqQua project is to:

  • build an AI foundation model of plankton image data,
  • roll out the model as an open-source tool for the global community for the purpose of facilitating everyone’s plankton-related research,
  • leverage this model to develop global plankton- and particle distribution models and estimate global plankton- and particle mediated process rates.

To achieve this, the AqQua project aims at collecting a diverse plankton image dataset from a variety of imaging devices (e.g. Underwater Vision Profilers, Zooscan, PlanktoScope, IFCB, FlowCam, …) deployed across different aquatic habitats worldwide. To assemble a most diverse and extensive dataset, the AqQua project encourages scientists around the world to share their plankton- and particle image data. In return, everyone sharing data will be included as author of a planned data paper. Furthermore, everyone sharing data will be invited to actively contribute to a respective foundation model paper, as well as to global distribution- and process rate papers. We therefore reach out to you, even if your data is already in the public domain. The AqQua foundation model will, most likely, perform better on the kinds of data it has been trained on, thus likely particularly facilitating the own research of data providers. The AqQua project will not analyze any provided dataset in isolation nor perform any respective local analyses.

Due to our full commitment to Open Science, all data shared with the AqQua project has to come with permission to be made publicly available earliest July 15, 2027 under CC BY-NC 4.0 license as part of our planned data paper. Thus we are exclusively seeking data with a moratorium of July 15, 2027 at the latest. The AqQua project aims to publicly release trained foundation models after November 1, 2025.

For the purpose of training a foundation model, the AqQua project requires the image data (including scale information), as well as at least latitude, longitude, depth, date and time of observation. Classification labels (e.g. species or particle type) and trait annotations (e.g. egg-carrying) are very welcome as these can help fine-tune and benchmark the foundation model, but are not required. We would also appreciate it if you would share sample unit definition and the sampled volume information for your samples, to enable us to develop global distribution models and to estimate process rates (e.g. as in Laget et al. 2024, Clements et al. 2022 & 2023). Image data of (mono)cultures is also welcome and in this case, metadata should indicate the original sampling location, date and time.

AqQua is a moonshot project that relies on comprehensive collaboration with academic labs and non-academic stakeholders across the globe. We sincerely hope that you will join AqQua's mission, which will not just yield maximally powerful AI for the benefit of all plankton-related research, but will also pave the way towards operational global mapping and monitoring of biodiversity, ecosystem health and carbon flux at unprecedented accuracy and granularity, thereby serving to aid decision making in times of global change. We are very happy that we have already received overwhelming signals of support from the global community, with more than 40 academic labs and non-academic stakeholders across the globe pledging to share data and contribute expertise. AqQua's mission can only be achieved by global collaboration. We would be stoked to have you on board!

To participate, please carefully read and fill the online form below. Note, after clicking the "Submit" button at the end of the form, you will be directed to a "print your answers" button, which allows you to download a pdf of your filled form for your records.

If you have any questions or suggestions, please contact us at aqqua@geomar.de.

DATA OWNER CONTACT INFORMATION
(This question is mandatory)
Name
(This question is mandatory)
Email
(This question is mandatory)
Institute / Company
We will create a data provider mailing list for general information about the data collection process and updates of the project. Do you want to be added to this list?
DATA DESCRIPTION
(This question is mandatory)
Do you want to share multiple datasets under the same conditions?
(This question is mandatory)
The data owner agrees to share the following dataset:
(This question is mandatory)

The data owner agrees to share the following datasets:

Please download the following template, insert your information line by line and upload it below.

Template Download

Note, if your data is on EcoTaxa, you can use the Python scripts in this repository to generate a table including all the columns required to fill our template.

  Title Comment File name
DATA SHARING PERMISSIONS

The data owner agrees to share the following types of data with the AqQua project: 

✅ Plankton / particle imaging data: Each image should capture (approximately) one object of interest; I.e., we are not seeking full frame images.

✅ Metadata: At a minimum, the following metadata has to be provided (e.g., a table that assigns the metadata to each image):

  •     Imaging device and image resolution
  •     Date and time of acquisition
  •     Location (latitude, longitude, and depth).

The data owner authorizes the AqQua project to use the data identified above for (mandatory; if any of these permissions is not viable for your data, we cannot take it in)

Deep Learning-based Model Development: The image data and all accompanying metadata can be used to train and analyze a foundational plankton image model, including fine-tuning for species classification, trait extraction, and related tasks.

✅ Data release by July 2027: The data owner authorizes the AqQua project to make the data publicly available under CC BY-NC 4.0 according to the FAIR (Findable, Accessible, Interoperable, and Reusable) principles earliest on July 15, 2027 as part of the datasets the AqQua project gathers. 

✅ Model release from November 2025 on: The data owner authorizes the AqQua project to publicly share trained AI models after November 1, 2025.

The data owner further authorizes the AqQua project to use the data described above for (please mark what applies; If you would like to choose different options for different parts of your imaging data, please fill the form multiple times, accordingly):

The AqQua project guarantees that each data owner will be:

Co-author of the resulting AqQua dataset publication as part of the data provider consortium.

Invited by the AqQua project to actively contribute as co-author to a resulting foundation model publication.

Invited by the AqQua project to actively contribute as co-author to publications on derived global distribution patterns and process rates.

(This question is mandatory)
For the planned data paper, to ensure proper Acknowledgments regarding your data, we will need respective text from you (e.g. naming people and funding to acknowledge). Would you like to provide such text for the "Acknowledgments" Section of the planned data paper now? (You can of course still change it later if needed; just contact us.) Or would you rather have us contact you about this at a later timepoint?
Please provide text that can go into the Acknowledgments Section of the planned data paper. You can still contact us later to change this (but we won't actively contact you about this).
(This question is mandatory)
The data owner authorizes the AqQua project to use the data specified above, for the purposes specified above, with the guarantees specified above, from the time of sharing, by providing it under the following license or dedicated data sharing agreement:
(If you would like to choose different options for different parts of your data, please fill the form multiple times, accordingly. If you choose "Other license", please name the license, which has to permit usage at least for the purposes specified above. Note, as outlined above, all data shared with AqQua will have to be made publicly available under CC BY-NC 4.0 earliest July 15, 2027. While AqQua guarantees to not publish or redistribute your data beforehand, this guarantee can only be made legally binding by means of an individual Data Sharing Agreement. Thus, choosing some license from the options below makes sense if your data is either already publicly available under this license, or you take our word -- albeit legally not binding in this form -- that we won't share it with others before July 15, 2027. If you are seeking a legally binding guarantee that your data won't be publicly available before July 15, 2027, select Data Sharing Agreement.)
(This question is mandatory)

Are you herewith newly sharing the data under the license specified above? 

(This question is mandatory)
Thank you for newly sharing your data under the license specified above. As described here, to ensure proper attribution, please provide the following information: 
METHOD OF DATA TRANSFER/ACCESS
(This question is mandatory)
Location of specified data

To share data on EcoTaxa, the data owner agrees to add the “AqQua” user to the project(s) they would like to share (role: "viewer") and authorizes the AqQua project to download the data from EcoTaxa. 

Here’s how you can do that:

  • In the menu of the annotation screen, select “Project / Edit project settings.”
  • Go to the “Privileges” tab and click “New privilege” (bottom right).
  • Enter “AqQua” in the name field and select the “View” role.
    Click “Save.”

If you are sharing more than just a few projects, you can use the Python scripts in this repository to add the AqQua user to all projects.

(This question is mandatory)
To share data that is not located on EcoTaxa, please select your preferred method of data transfer. Images should be provided in a format routinely used by the data owner for downstream analyses.
If your data is already accessible via URL (+Token), please insert the information below. Otherwise we will contact you to finalize the details for the data transfer.
CONFIDENTIALITY

Until the publication of the AqQua dataset, data will be stored on internal Helmholtz servers. Access to the data will be restricted to AqQua project members and secured through login credentials associated with the AqQua Project.
The AqQua Project will take reasonable measures to protect the confidentiality of the contributed data until the agreed-upon release date.