, ,
• •

Dataset Report

Unique identifier:	BGDVR196.0.0
Title:	B-GOOD Virus Sequences
Long title:	Dataset from the B-GOOD project, containing GenBank accession numbers for virus sequences.
Status:	Quality Validated
Current Version:	v. 1.0
Published:	2025-03-17
Reviewed by:	Rubinigg Michael as Data scientist
Citation proposal:	B-GOOD Bee Health Data Portal 2025 Report of dataset B-GOOD Virus Sequences, v. 1.0 [BGDVR196.0.0]. EU Pollinator Hub. [2026-05-16] app.pollinatorhub.eu

Compliance with FAIR* principles
Findable	Accessible	Interoperable	Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles

Data Quality

Good

This document is intended for use by collaborators of the EU Pollinator Hub and may be passed on with the express permission of the leader of the consortium and for the purpose determined by the leader of the consortium.

Table of content

Document history
1. Release
2. Revision
Abbreviations
Executive summary
Introduction
Material and methods
Data description
1. Dataset
2. Tables
  1. ABPV
  2. CBPV
  3. DWV
References
Annex 1: Table column reports

Document history

Release

Version v. 1.0 released on 2025-03-17. Reviewed by Rubinigg Michael.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).

No	Date	Description	Reason
1	2025-03-17 00:03:00	Initial release.	n/a

Abbreviations

ABPV

Acute Bee Paralysis Virus

CBPV

Chronic Bee Paralysis Virus

CSV

Comma-Separated Values

DWV

Deformed Wing Virus

European Union

EUPH

EU Pollinator Hub

INRAE

Institut National de la Recherche Agronomique (National Research Institute for Agriculture, Food and Environment)

Executive summary

Data overview:

The data was published by Bonjour-Dalmon A (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme. It contains the GenBank accession numbers for virus sequences used to quantify various viruses (ABPV, CBPV DWV) in honey bee samples.

Data value:

The objectives of the B-GOOD project were: (1) Facilitate decision making for beekeepers and other stakeholders by establishing ready-to-use tools for operationalising the HSI; (2) Test, standardise and validate methods for measuring and reporting selected indicators affecting bee health; (3) Explore the various socio-economic and ecological factors beyond bee health; (4) Foster an EU community to collect and share knowledge related to honey bees and their environment; (5) Engender a lasting learning and innovation system (LIS); (6) Minimise the impact of biotic and abiotic stressors.

Data description:

n/a

Data application:

Currently, the data integrated from the B-GOOD Bee Health Data Portal contains major issues and does not fully comply with the FAIR Guiding Principles for scientific data management and stewardship applied on the EU Pollinator Hub. More descriptive information about the context, quality and condition, or characteristics of the data (e.g. protocols, measurement devices used, units of the captured data, or any other details about the study) must be provided. More metadata in the form of accurate explanations of all variable names must be provided.

Unresolved issues:

n/a

Introduction

n/a

Material and methods

Data acquisition

All raw data files were downloaded from the B-GOOD Bee Health Data Portal on 2024-09-26 18:16:30.

List of raw data obtained from the data provider.

File Accession_Numbers-GENBANK.xlsx, accessed on 2024-09-26 18:16:30, provided by B-GOOD Bee Health Data Portal

Metadata was obtained from the dataset's web page.

Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.

No	File	Type	D	M	Arrival	Data points	File size
1	ABPV_PREP_MR_241102.csv	CSV - Comma seperated values	Yes	No	2024-11-02 11:11:50	121	1.37 KiB
2	CBPV_PREP_MR_241102.csv	CSV - Comma seperated values	Yes	No	2024-11-02 11:11:15	352	3.66 KiB
3	DWV_PREP_MR_241102.csv	CSV - Comma seperated values	Yes	No	2024-11-02 11:11:38	1,353	13.25 KiB

Data preparation

The file in the zip-archives was extracted using File Explorer (Microsoft Corporation, version 22H2).

The file Accession_Numbers-GENBANK.xlsx was opened with MS Excel (Microsoft Corporation, version 2409). The worksheets were exported to data files in CSV format (UTF-8 encoding) and imported into Notepad++ (version 8.7) where missing values were substituted by {NULL} using regular expressions. Dates were parsed to the required YYYY-MM-DD format using the python script ParseDates.py.

Data was then exported to the respective preparatory files and uploaded to the EU Pollinator Hub according to SOP-017 (Dataset integration.

Data validation

No data validation was performed.

Data analysis

No data analysis was performed.

Data description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).

No	Table	Description
1	ABPV	GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)
2	CBPV	GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)
3	DWV	GeneBank acession numbers for Deformed Wing Virus (DWV) Two regions of the genome were targeted to assess DWV diversity and…

Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
interactions.single.uid	BGDVR196.0.0
Title	B-GOOD Virus Sequences
Long title	Dataset from the B-GOOD project, containing GenBank accession numbers for virus sequences.
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/BGDVR196.0.0
interactions.single.section-details.licence	CC0 1.0 Universal
DOI	n/a
Created	2024-11-02
Published	2025-03-17
Contact	n/a
Keywords	ABPV, BQCV, CBPV, DWV, SBV
Data collection years	n/a
Regions, the data was collected in	Belgique/België, Deutschland, France, Nederland, Portugal, România, Schweiz/Suisse/Svizzera, United Kingdom
Abstract	Dataset containing the GenBank accession for virus sequences used to quantify various viruses in honey bees (ABPV, CBPV, DWV) in honey bee samples. It was published by Bonjour-Dalmon A (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme.

Table 5. Standardised metadata of the data provider B-GOOD Bee Health Data Portal. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Name	B-GOOD Bee Health Data Portal
Url
Acronym	B-GOOD
IRI	https://app.pollinatorhub.eu/data-providers/b-good-bee-health-data-portal
Address	https://b-good-project.eu
Country	Belgium
Contact	b-good-project.eu
Description	Project funded by the EU Horizon 2020 Research and Innovation Programme under grant agreement No 817622. Project website: https://b-good-project.eu

Tables

ABPV

Table 6. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	BGDVR196.ABPVA480.0
Name	ABPV
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.ABPVA480.0
Table Type	File
Licence	CC0 1.0 Universal
Description	GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)

GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)

Metadata

Column GenBank accession numbers links to https://www.ncbi.nlm.nih.gov/nuccore/{GenBank accession numbers}

Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
Genbank submission ID	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).	Integer number	Integer [0.0.NTGER313]	n/a
GenBank accession numbers	GenBank Accession	String	pms:genBankAccession [0.0.GNBNK515]	n/a
Submission date	Date of submission to GenBank.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Publication date	Not specified by the data provider. Presumably the date at which the GenBank submission was published.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Sequence_ID	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Sequenced region	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Size	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.	Integer number	Integer [0.0.NTGER313]	bp
Isolate	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Host	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.	String	dwc:scientificName [0.0.SCNTF503]	n/a
Country	Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.	String	Text [0.0.TEXTA315]	n/a
Collection_date	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.	String	Text [0.0.TEXTA315]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,743,798.7	2,743,639	2,743,639	2,743,890	2,743,890	2,743,890	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 18.2% )
GenBank accession numbers	8 - 8	n/a	OR540712	n/a	n/a	n/a	OR540722	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11 ( 100.0% )
Submission date	10 - 10	n/a	2023-12-09	n/a	n/a	n/a	2023-12-09	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )
Publication date	10 - 10	n/a	2024-03-18	n/a	n/a	n/a	2024-03-18	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )
Sequence_ID	20 - 24	n/a	2247_UK_Aut2…	n/a	n/a	n/a	BG125-20_RO_…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11 ( 100.0% )
Sequenced region	5 - 5	n/a	3'ORF	n/a	n/a	n/a	5'ORF	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 18.2% )
Size	3 - 3	796.9	777	789	796	808	815	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	10 ( 90.9% )
Isolate	4 - 8	n/a	2247	n/a	n/a	n/a	BG125-20	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	7 ( 63.6% )
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )
Country	6 - 14	n/a	France	n/a	n/a	n/a	United Kingd…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	3 ( 27.3% )
Collection_date	10 - 10	n/a	2020-05-22	n/a	n/a	n/a	2021-02-03	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	5 ( 45.5% )

Quality measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	18.18%	2743890	2743639
GenBank accession numbers	100.00%	100.00%	OR540712	OR540712
Submission date	100.00%	9.09%	2023-12-09	2023-12-09
Publication date	100.00%	9.09%	2024-03-18	2024-03-18
Sequence_ID	100.00%	100.00%	2247_UK_Aut2020_3ORF	2247_UK_Aut2020_3ORF
Sequenced region	100.00%	18.18%	5'ORF	3'ORF
Size	100.00%	90.91%	789	798
Isolate	100.00%	63.64%	2247	2253
Host	100.00%	9.09%	Apis mellifera	Apis mellifera
Country	100.00%	27.27%	United Kingdom	Romania
Collection_date	100.00%	45.45%	2020-05-22	2020-12-09

Changes made to preparatory file

None

Changes made to data

None

Unresolved issues

For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

CBPV

Table 10. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	BGDVR196.CBPVA481.0
Name	CBPV
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.CBPVA481.0
Table Type	File
Licence	CC0 1.0 Universal
Description	GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)

GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)

Metadata

Column GenBank accession numbers links to https://www.ncbi.nlm.nih.gov/nuccore/{GenBank accession numbers}

Table 11. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
Genbank submission ID	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).	Integer number	Integer [0.0.NTGER313]	n/a
GenBank accession numbers	GenBank Accession	String	pms:genBankAccession [0.0.GNBNK515]	n/a
Submission date	Date of submission to GenBank.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Publication date	Not specified by the data provider. Presumably the date at which the GenBank submission was published.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Sequence_ID	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Sequenced region	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Size	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.	Integer number	Integer [0.0.NTGER313]	bp
Isolate	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Host	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.	String	dwc:scientificName [0.0.SCNTF503]	n/a
Country	Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.	String	Text [0.0.TEXTA315]	n/a
Collection_date	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.	String	Text [0.0.TEXTA315]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 12. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,744,926.5	2,743,909	2,743,909	2,743,909	2,746,869	2,746,869	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )
GenBank accession numbers	8 - 8	n/a	OR582944	n/a	n/a	n/a	OR584328	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 100.0% )
Submission date	10 - 10	n/a	2023-09-21	n/a	n/a	n/a	2023-09-22	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )
Publication date	10 - 10	n/a	2023-03-29	n/a	n/a	n/a	2024-03-28	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )
Sequence_ID	9 - 18	n/a	134F3_FR_201…	n/a	n/a	n/a	Ruche24_FR_2…	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 100.0% )
Sequenced region	4 - 4	n/a	RNA1	n/a	n/a	n/a	RdRp	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )
Size	3 - 4	2,233.5	700	722	2,566	3,541.5	3,588	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	24 ( 75.0% )
Isolate	5 - 20	n/a	134.0f4	n/a	n/a	n/a	FR20HM439	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	23 ( 71.9% )
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 3.1% )
Country	12 - 28	n/a	France: Cent…	n/a	n/a	n/a	France: PACA	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 12.5% )
Collection_date	4 - 10	n/a	2014-07-18	n/a	n/a	n/a	2021-06-15	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	14 ( 43.8% )

Quality measures

Table 13. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	6.25%	2743909	2746869
GenBank accession numbers	100.00%	100.00%	OR582944	OR582944
Submission date	100.00%	6.25%	2023-09-21	2023-09-22
Publication date	100.00%	6.25%	2024-03-28	2023-03-29
Sequence_ID	100.00%	100.00%	1_FR_2019	1_FR_2019
Sequenced region	100.00%	6.25%	RNA1	RdRp
Size	100.00%	75.00%	2566	3586
Isolate	100.00%	71.88%	134.0f4	CBPV 1.0 f4
Host	100.00%	3.13%	Apis mellifera	Apis mellifera
Country	100.00%	12.50%	France: PACA	France: Grand-Est
Collection_date	100.00%	43.75%	2016	2020-10-08

Changes made to preparatory file

None

Changes made to data

Missing values (66 occurrences) were replaced by {NULL}.

Unresolved issues

For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

DWV

Table 14. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	BGDVR196.DWVAB482.0
Name	DWV
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.DWVAB482.0
Table Type	File
Licence	CC0 1.0 Universal
Description	GeneBank acession numbers for Deformed Wing Virus (DWV) Two regions of the genome were targeted to assess DWV diversity and to look for putative recombination events: the 5’ end and the region encompassing genes coding for structural proteins, as well as the Helicase-coding region. All DWV-A and/or DWV-B positive samples (n=116) were sequenced. A long sequencing of about all 5’ half of the virus genome was performed on some samples when possible (n=10).

GeneBank acession numbers for Deformed Wing Virus (DWV)

Two regions of the genome were targeted to assess DWV diversity and to look for putative recombination events: the 5’ end and the region encompassing genes coding for structural proteins, as well as the Helicase-coding region. All DWV-A and/or DWV-B positive samples (n=116) were sequenced. A long sequencing of about all 5’ half of the virus genome was performed on some samples when possible (n=10).

Metadata

Column GenBank accession numbers links to https://www.ncbi.nlm.nih.gov/nuccore/{GenBank accession numbers}

Table 15. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
Genbank submission ID	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).	Integer number	Integer [0.0.NTGER313]	n/a
GenBank accession numbers	GenBank Accession	String	pms:genBankAccession [0.0.GNBNK515]	n/a
Submission date	Date of submission to GenBank.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Publication date	Not specified by the data provider. Presumably the date at which the GenBank submission was published.	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
Sequence_ID	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Sequenced region	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Size	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.	Integer number	Integer [0.0.NTGER313]	bp
Isolate	Not specified by the data provider.	String	Text [0.0.TEXTA315]	n/a
Host	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.	String	dwc:scientificName [0.0.SCNTF503]	n/a
Country	Not specified by the data provider. Presumably the name of the country in which the virus has been isolated.	String	Text [0.0.TEXTA315]	n/a
Collection_date	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.	String	Text [0.0.TEXTA315]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 16. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,734,179.0	2,733,328	2,733,349	2,733,349	2,733,374	2,743,492	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 3.3% )
GenBank accession numbers	0 - 8	n/a	OR437235	n/a	n/a	n/a	OR437298	123	59 ( 48.0% )	0 ( 0.0% )	0 ( 0.0% )	65 ( 52.8% )
Submission date	10 - 10	n/a	2023-11-08	n/a	n/a	n/a	2023-11-09	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 1.6% )
Publication date	10 - 10	n/a	2024-03-18	n/a	n/a	n/a	2024-12-01	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 1.6% )
Sequence_ID	15 - 22	n/a	2141B_UK_Spr…	n/a	n/a	n/a	BG5_ITR1_NL_…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	121 ( 98.4% )
Sequenced region	3 - 11	n/a	5' complete	n/a	n/a	n/a	ITR	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 3.3% )
Size	3 - 4	1,496.6	668	1,124	1,186	1,341	6,450	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	81 ( 65.9% )
Isolate	3 - 5	n/a	2141	n/a	n/a	n/a	BG5	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	66 ( 53.7% )
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 0.8% )
Country	6 - 14	n/a	Belgium	n/a	n/a	n/a	United Kingd…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	8 ( 6.5% )
Collection_date	4 - 10	n/a	2020	n/a	n/a	n/a	2021-11-23	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	33 ( 26.8% )

Quality measures

Table 17. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	3.25%	2733374	2743492
GenBank accession numbers	52.03%	52.85%	n/a	OR437235
Submission date	100.00%	1.63%	2023-11-08	2023-11-09
Publication date	100.00%	1.63%	2024-12-01	2024-03-18
Sequence_ID	100.00%	98.37%	2261_FR_Aut2020	2141B_UK_Spr2020
Sequenced region	100.00%	3.25%	A/B/R	5' complete
Size	100.00%	65.85%	1179	1255
Isolate	100.00%	53.66%	2689	2141
Host	100.00%	0.81%	Apis mellifera	Apis mellifera
Country	100.00%	6.50%	France	Romania
Collection_date	100.00%	26.83%	2020-10-16	2020-05-02

Changes made to preparatory file

None

Changes made to data

Missing values (307 occurrences) were replaced by {NULL}.

Unresolved issues

For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

References

Bonjour-Dalmon A. 2023 Genbank Accession numbers for virus sequences. B-GOOD Bee Health Data Portal. [2024-11-2] beehealthdata.org

Annex 1: Table column reports

Table: ABPV

Column: Genbank submission ID

Table 18. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Genbank submission ID
Description	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 19. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,743,798.7	2,743,639	2,743,639	2,743,890	2,743,890	2,743,890	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 18.2% )

Table 20. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	18.18%	2743890	2743639

Data Distribution Top 20

Figure 1. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 2. Distribution of values in the column.

Outliers

Figure 3. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 4. Visualization of completeness of the data in the column.

Uniqueness

Figure 5. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 21. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	GenBank accession numbers
Description	GenBank Accession
Data type	String
Descriptor	pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description	The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit	n/a

Table 22. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
GenBank accession numbers	8 - 8	n/a	OR540712	n/a	n/a	n/a	OR540722	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11 ( 100.0% )

Table 23. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
GenBank accession numbers	100.00%	100.00%	OR540712	OR540712

Completeness

Figure 6. Visualization of completeness of the data in the column.

Uniqueness

Figure 7. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 24. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Submission date
Description	Date of submission to GenBank.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 25. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Submission date	10 - 10	n/a	2023-12-09	n/a	n/a	n/a	2023-12-09	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )

Table 26. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Submission date	100.00%	9.09%	2023-12-09	2023-12-09

Data Distribution Top 20

Figure 8. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 9. Visualization of completeness of the data in the column.

Uniqueness

Figure 10. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 27. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Publication date
Description	Not specified by the data provider. Presumably the date at which the GenBank submission was published.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 28. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Publication date	10 - 10	n/a	2024-03-18	n/a	n/a	n/a	2024-03-18	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )

Table 29. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Publication date	100.00%	9.09%	2024-03-18	2024-03-18

Data Distribution Top 20

Figure 11. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 12. Visualization of completeness of the data in the column.

Uniqueness

Figure 13. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 30. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequence_ID
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 31. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequence_ID	20 - 24	n/a	2247_UK_Aut2…	n/a	n/a	n/a	BG125-20_RO_…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11 ( 100.0% )

Table 32. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequence_ID	100.00%	100.00%	2247_UK_Aut2020_3ORF	2247_UK_Aut2020_3ORF

Completeness

Figure 14. Visualization of completeness of the data in the column.

Uniqueness

Figure 15. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 33. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequenced region
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 34. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequenced region	5 - 5	n/a	3'ORF	n/a	n/a	n/a	5'ORF	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 18.2% )

Table 35. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequenced region	100.00%	18.18%	5'ORF	3'ORF

Data Distribution Top 20

Figure 16. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

Column: Size

Table 36. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Size
Description	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	bp

Table 37. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Size	3 - 3	796.9	777	789	796	808	815	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	10 ( 90.9% )

Table 38. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Size	100.00%	90.91%	789	798

Continuous Data Distribution

Figure 19. Distribution of values in the column.

Outliers

Figure 20. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 21. Visualization of completeness of the data in the column.

Uniqueness

Figure 22. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 39. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Isolate
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 40. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Isolate	4 - 8	n/a	2247	n/a	n/a	n/a	BG125-20	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	7 ( 63.6% )

Table 41. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Isolate	100.00%	63.64%	2247	2253

Completeness

Figure 23. Visualization of completeness of the data in the column.

Uniqueness

Figure 24. Visualization of uniqueness of the data in the column.

Column: Host

Table 42. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Host
Description	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.
Data type	String
Descriptor	dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description	The full scientific name, with authorship and date information if known.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/scientificName
Unit	n/a

Table 43. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 9.1% )

Table 44. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Host	100.00%	9.09%	Apis mellifera	Apis mellifera

Data Distribution Top 20

Figure 25. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 26. Visualization of completeness of the data in the column.

Uniqueness

Figure 27. Visualization of uniqueness of the data in the column.

Column: Country

Table 45. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Country
Description	Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 46. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Country	6 - 14	n/a	France	n/a	n/a	n/a	United Kingd…	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	3 ( 27.3% )

Table 47. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Country	100.00%	27.27%	United Kingdom	Romania

Data Distribution Top 20

Figure 28. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 29. Visualization of completeness of the data in the column.

Uniqueness

Figure 30. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 48. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Collection_date
Description	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 49. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Collection_date	10 - 10	n/a	2020-05-22	n/a	n/a	n/a	2021-02-03	11	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	5 ( 45.5% )

Table 50. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Collection_date	100.00%	45.45%	2020-05-22	2020-12-09

Completeness

Figure 31. Visualization of completeness of the data in the column.

Uniqueness

Figure 32. Visualization of uniqueness of the data in the column.

Table: CBPV

Column: Genbank submission ID

Table 51. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Genbank submission ID
Description	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 52. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,744,926.5	2,743,909	2,743,909	2,743,909	2,746,869	2,746,869	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )

Table 53. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	6.25%	2743909	2746869

Data Distribution Top 20

Figure 33. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 34. Distribution of values in the column.

Outliers

Figure 35. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 36. Visualization of completeness of the data in the column.

Uniqueness

Figure 37. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 54. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	GenBank accession numbers
Description	GenBank Accession
Data type	String
Descriptor	pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description	The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit	n/a

Table 55. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
GenBank accession numbers	8 - 8	n/a	OR582944	n/a	n/a	n/a	OR584328	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 100.0% )

Table 56. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
GenBank accession numbers	100.00%	100.00%	OR582944	OR582944

Completeness

Figure 38. Visualization of completeness of the data in the column.

Uniqueness

Figure 39. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 57. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Submission date
Description	Date of submission to GenBank.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 58. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Submission date	10 - 10	n/a	2023-09-21	n/a	n/a	n/a	2023-09-22	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )

Table 59. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Submission date	100.00%	6.25%	2023-09-21	2023-09-22

Data Distribution Top 20

Figure 40. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 41. Visualization of completeness of the data in the column.

Uniqueness

Figure 42. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 60. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Publication date
Description	Not specified by the data provider. Presumably the date at which the GenBank submission was published.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 61. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Publication date	10 - 10	n/a	2023-03-29	n/a	n/a	n/a	2024-03-28	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )

Table 62. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Publication date	100.00%	6.25%	2024-03-28	2023-03-29

Data Distribution Top 20

Figure 43. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 44. Visualization of completeness of the data in the column.

Uniqueness

Figure 45. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 63. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequence_ID
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 64. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequence_ID	9 - 18	n/a	134F3_FR_201…	n/a	n/a	n/a	Ruche24_FR_2…	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 100.0% )

Table 65. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequence_ID	100.00%	100.00%	1_FR_2019	1_FR_2019

Completeness

Figure 46. Visualization of completeness of the data in the column.

Uniqueness

Figure 47. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 66. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequenced region
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 67. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequenced region	4 - 4	n/a	RNA1	n/a	n/a	n/a	RdRp	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 6.3% )

Table 68. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequenced region	100.00%	6.25%	RNA1	RdRp

Data Distribution Top 20

Figure 48. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 49. Visualization of completeness of the data in the column.

Uniqueness

Figure 50. Visualization of uniqueness of the data in the column.

Column: Size

Table 69. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Size
Description	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	bp

Table 70. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Size	3 - 4	2,233.5	700	722	2,566	3,541.5	3,588	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	24 ( 75.0% )

Table 71. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Size	100.00%	75.00%	2566	3586

Continuous Data Distribution

Figure 51. Distribution of values in the column.

Outliers

Figure 52. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 53. Visualization of completeness of the data in the column.

Uniqueness

Figure 54. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 72. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Isolate
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 73. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Isolate	5 - 20	n/a	134.0f4	n/a	n/a	n/a	FR20HM439	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	23 ( 71.9% )

Table 74. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Isolate	100.00%	71.88%	134.0f4	CBPV 1.0 f4

Completeness

Figure 55. Visualization of completeness of the data in the column.

Uniqueness

Figure 56. Visualization of uniqueness of the data in the column.

Column: Host

Table 75. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Host
Description	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.
Data type	String
Descriptor	dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description	The full scientific name, with authorship and date information if known.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/scientificName
Unit	n/a

Table 76. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 3.1% )

Table 77. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Host	100.00%	3.13%	Apis mellifera	Apis mellifera

Data Distribution Top 20

Figure 57. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 58. Visualization of completeness of the data in the column.

Uniqueness

Figure 59. Visualization of uniqueness of the data in the column.

Column: Country

Table 78. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Country
Description	Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 79. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Country	12 - 28	n/a	France: Cent…	n/a	n/a	n/a	France: PACA	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 12.5% )

Table 80. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Country	100.00%	12.50%	France: PACA	France: Grand-Est

Data Distribution Top 20

Figure 60. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 61. Visualization of completeness of the data in the column.

Uniqueness

Figure 62. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 81. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Collection_date
Description	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 82. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Collection_date	4 - 10	n/a	2014-07-18	n/a	n/a	n/a	2021-06-15	32	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	14 ( 43.8% )

Table 83. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Collection_date	100.00%	43.75%	2016	2020-10-08

Completeness

Figure 63. Visualization of completeness of the data in the column.

Uniqueness

Figure 64. Visualization of uniqueness of the data in the column.

Table: DWV

Column: Genbank submission ID

Table 84. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Genbank submission ID
Description	Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 85. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Genbank submission ID	7 - 7	2,734,179.0	2,733,328	2,733,349	2,733,349	2,733,374	2,743,492	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 3.3% )

Table 86. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Genbank submission ID	100.00%	3.25%	2733374	2743492

Data Distribution Top 20

Figure 65. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 66. Distribution of values in the column.

Outliers

Figure 67. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 68. Visualization of completeness of the data in the column.

Uniqueness

Figure 69. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 87. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	GenBank accession numbers
Description	GenBank Accession
Data type	String
Descriptor	pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description	The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit	n/a

Table 88. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
GenBank accession numbers	0 - 8	n/a	OR437235	n/a	n/a	n/a	OR437298	123	59 ( 48.0% )	0 ( 0.0% )	0 ( 0.0% )	65 ( 52.8% )

Table 89. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
GenBank accession numbers	52.03%	52.85%	n/a	OR437235

Completeness

Figure 70. Visualization of completeness of the data in the column.

Uniqueness

Figure 71. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 90. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Submission date
Description	Date of submission to GenBank.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 91. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Submission date	10 - 10	n/a	2023-11-08	n/a	n/a	n/a	2023-11-09	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 1.6% )

Table 92. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Submission date	100.00%	1.63%	2023-11-08	2023-11-09

Data Distribution Top 20

Figure 72. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 73. Visualization of completeness of the data in the column.

Uniqueness

Figure 74. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 93. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Publication date
Description	Not specified by the data provider. Presumably the date at which the GenBank submission was published.
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 94. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Publication date	10 - 10	n/a	2024-03-18	n/a	n/a	n/a	2024-12-01	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2 ( 1.6% )

Table 95. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Publication date	100.00%	1.63%	2024-12-01	2024-03-18

Data Distribution Top 20

Figure 75. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 76. Visualization of completeness of the data in the column.

Uniqueness

Figure 77. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 96. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequence_ID
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 97. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequence_ID	15 - 22	n/a	2141B_UK_Spr…	n/a	n/a	n/a	BG5_ITR1_NL_…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	121 ( 98.4% )

Table 98. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequence_ID	100.00%	98.37%	2261_FR_Aut2020	2141B_UK_Spr2020

Completeness

Figure 78. Visualization of completeness of the data in the column.

Uniqueness

Figure 79. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 99. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Sequenced region
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 100. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Sequenced region	3 - 11	n/a	5' complete	n/a	n/a	n/a	ITR	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	4 ( 3.3% )

Table 101. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Sequenced region	100.00%	3.25%	A/B/R	5' complete

Data Distribution Top 20

Figure 80. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 81. Visualization of completeness of the data in the column.

Uniqueness

Figure 82. Visualization of uniqueness of the data in the column.

Column: Size

Table 102. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Size
Description	Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	bp

Table 103. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Size	3 - 4	1,496.6	668	1,124	1,186	1,341	6,450	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	81 ( 65.9% )

Table 104. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Size	100.00%	65.85%	1179	1255

Continuous Data Distribution

Figure 83. Distribution of values in the column.

Outliers

Figure 84. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 85. Visualization of completeness of the data in the column.

Uniqueness

Figure 86. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 105. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Isolate
Description	Not specified by the data provider.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 106. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Isolate	3 - 5	n/a	2141	n/a	n/a	n/a	BG5	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	66 ( 53.7% )

Table 107. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Isolate	100.00%	53.66%	2689	2141

Completeness

Figure 87. Visualization of completeness of the data in the column.

Uniqueness

Figure 88. Visualization of uniqueness of the data in the column.

Column: Host

Table 108. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Host
Description	Not specified by the data provider. Presumably the host species from which which the virus has been isolated.
Data type	String
Descriptor	dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description	The full scientific name, with authorship and date information if known.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/scientificName
Unit	n/a

Table 109. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Host	14 - 14	n/a	Apis mellife…	n/a	n/a	n/a	Apis mellife…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1 ( 0.8% )

Table 110. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Host	100.00%	0.81%	Apis mellifera	Apis mellifera

Data Distribution Top 20

Figure 89. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 90. Visualization of completeness of the data in the column.

Uniqueness

Figure 91. Visualization of uniqueness of the data in the column.

Column: Country

Table 111. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Country
Description	Not specified by the data provider. Presumably the name of the country in which the virus has been isolated.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 112. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Country	6 - 14	n/a	Belgium	n/a	n/a	n/a	United Kingd…	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	8 ( 6.5% )

Table 113. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Country	100.00%	6.50%	France	Romania

Data Distribution Top 20

Figure 92. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 93. Visualization of completeness of the data in the column.

Uniqueness

Figure 94. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 114. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	Collection_date
Description	Not specified by the data provider. Presumably the approximate date at which the sample has been collected.
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 115. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
Collection_date	4 - 10	n/a	2020	n/a	n/a	n/a	2021-11-23	123	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	33 ( 26.8% )

Table 116. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
Collection_date	100.00%	26.83%	2020-10-16	2020-05-02

Completeness

Figure 95. Visualization of completeness of the data in the column.

Uniqueness

Figure 96. Visualization of uniqueness of the data in the column.